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I-  INTBQPOCTION 

Around  1964  a  new  term  appeared  in  the  computer  litera- 
ture to  denote  a  new  concept.  The  term  was  "database,"  and 
it  was  going  to  play  a  highly  significant  role  in  an  organi- 
zation's information  system.  The  information  system 
supports  the  organization's  functions,  maintaining  the  data 
for  these  functions  and  assisting  users  to  interpret  the 
data  for  decision  making.  The  database  becomes  an  important 
tool  in  this  process;  it  is  the  container  of  the  data  in  the 
information  system. 

In  many  information  systems,  database  denotes  collec- 
tions of  data  shared  by  end-users  of  computer  systems.  Users 
who  make  decisions  obtain  data  by  accessing  the  database  and 
then  recording  their  decision  in  it.  Easy  access  to  a 
variety  of  data  frca  a  number  of  locations  enables  the 
information  system  to  quickly  respond  to  the  needs  of  deci- 
sion makers  within  the  organization,  whereas  poor  access  can 
of  course  hinder  rapid  response.  If  the  data  are  not 
readily  available,  decisions  may  be  either  delayed  unneces- 
sarily or  made  with  incomplete  data,  leading  to  possible 
system  malfunction  In   the  future. 

Th€  flexibility  of  the  database  structures  is  a  very 
important  feature  to  meet  changing  organizational  needs.  As 
new  functions  arise  in  an  organization,  new  decisions  fellow 
in  their  wake.  Since  the  database  will  need  to  store  new 
data  and  accommodate  new  relationships  to  support  the  new 
decisions,  it  must  include  facilities  to  allow  such  chanjes 
to  be  easily  made.   [Bef.  1:pp.  1-3] 

Today,  computer  applications  in  which  many  users  at 
terminals  concurrently  access  a  database  are  called  "data- 
base  applications"  [Bef.  2].     A  significant   new  kind   of 
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software,  the  database  management  system,  or  DBMS,  has 
evolved  to  facilitate  the  development  of  database  applica- 
tions. The  development  of  DBMS,  in  turn,  has  given  rise  to 
new  languages,  algorithms,  and  software  techniques  which 
together  make  up  what  might  be  called  a  database  technology. 

Database  technology  has  been  driven   by,   and  to  a  large 
extent  distinguished  from  other  software  technologies  by  the 
following  broad  user  requirements. 
.Data  consolidation 
.Data  independence 
,  .Data  protection 

In  the  years  ahead,  database  systems  will  become 
increasingly  widespread  and  increasingly  importart.  At 
present,  however,  they  represent  a  new  and  relatively  unexp- 
lored field,  despite  the  fact  that  the  number  of  systems 
installed  or  under  development  is  growing  rapidly. 

The  primary  goal  of  this  thesis  is  to  present  the  design 
steps  of  a  particular  database  system,  design  criteria,  and 
the  elements  of  the  database  system  which  provide  designers 
with  the  ability  tc  evaluate  databases  against  these 
criteria.  The  second  objective  of  this  thesis  is  to  show  the 
implementation  of  that  database  system  which  controls  and 
executes  the  transactions  written  in  a  model-based  database 
language  such  as  Data  Definition  Language  (DDL)  and  Data 
Manipulation  Language  (DML) .  Finally,  the  third  objective  of 
this  study  is  to  introduce  essential  features  of  the  nain- 
tainatility,  administration,  and  security  of  a  database 
management  system. 

Chapter  II  describes  the  basic  concepts  of  database, 
including  the  definition  of  a  DBS,  its  components,  its 
architecture,  and  some  advantages/disadvantages.  Chapter  III 
briefly  reviews  the  design  objectives  and  techniques  of  a 
database  and  describes  logical  and  physical  database  design. 
Chapter  17  also  briefly  addresses   database  models  which  can 
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te  us€d  to  form  a  logical  framework  of  a  database  and  to 
support  further  design  phases  and/or  to  create  intended 
database  structure  which  will  be  implemented  after  the 
completion  of  design  phases.  Chapter  V  introduces,  in 
detail,  the  Semantic  Database  Model  for  a  personnel  assign- 
ment database.  Chapters  VI  and  YII  describe  the  design  of 
the  personnel  database  by  using  the  Eelational  Database 
Model  approach  which  is  one  of  the  three  database  models.  In 
addition  rules,  design  criteria,  and  important  operations 
associated  with  this  nodel  are  given.  Chapter  711  also  shows 
how  the  designer  can  transform  the  SDM  model  which  has  been 
designed  for  a  personnel  database  system  into  a  relational 
database  model.  The  INGRES  Database  Management  System  which 
is  available  today  is  discussed  in  Chapter  VIII.  Chapter  IX 
demonstrates  the  implementation  of  the  relational  database 
system  which  is  implemented  on  the  VAX  computer  systems  by 
using  the  ORACLE  Relational  DBMS.  Chapter  X  describes  the 
functions  of  a  DBMS,  such  as  security  features,  maintain- 
ability, and  concurrent  processing  control.  Finally,  conclu- 
sions and  recommendations  based  on  our  research  are 
presented  in  Chapter  XI. 
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II.  BASIC  CONCEPTS  OF  DATABASE 

A.   DEFIBITION  OF  A  lATABASE  SISTEM 

The  simplest  definition  of  a  database  might  te  that  a 
database  is  a  collection  of  facts  or  a  repository  for  stored 
data  which  is  both  integrated  and  shared.  By  "integrated" 
we  mean  that  the  database  may  be  considered  as  a  unification 
of  several  otherwise  distinct  data  files,  with  any  redun- 
dancy among  those  files  partially  or  wholly  eliminated.  By 
"shared"  we  mean  that  individual  pieces  of  data  in  the  data- 
base may  be  shared  among  several  different  users,  in  the 
sense  that  each  of  those  users  may  have  access  to  the  same 
piece  of  data.  The  term  "shared"  is  also  extended  to  cover 
concurrent  sharing:  that  is,  the  ability  for  several  users 
to  be  accessing  the  database  at  the  same  time.  [Ref.  3: pp. 
3-7] 

R.W,  Engles  [fief.  4]  refers  to  the  data  in  a  database  as 
"operational  data,"  distinguishing  it  from  input  data, 
output  data,  and  other  kinds  of  data.  Thus,  a  modified 
version  of  Engles*  original  definition  of  database  is  that  a 
database  is  a  collection  of  stored  operational  data  used  by 
the  application  systems  of  some  particular  enterprise. 
"Enterprise"  is  simply  a  convenient  generic  term  for  any 
reasonably  self-contained  commercial,  scientific,  technical, 
or  other  organization.  Any  enterprise  must  necessarily 
maintain  a  large  amount  of  data  about  its  operation.  This  is 
its  "operational  data,"  such  as  product  data,  account  data, 
military  personnel  data  etc. 

In  recent  years,  technology  improved  to  the  point  where 
it  became  feasible  to  design,  build,  and  operate  large-scale 
collections  of   data  in   a  computer   environment.   In   other 
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words,  organizatioES  realized  that  data  were  a  valuatle 
resource  and  needed  to  be  centrally  managed.  The  conce£:t  of 
a  database  has  thus  emerged  fully  only  in  recent  years.  A 
database  can  also  be  defined  as  a  computerized  collection  of 
stored  operational  data  that  serves  the  needs  of  nultijle 
users  within  one  or  mere  organizations  £Ref.  5:pp.  3-17].  A 
Xey  point  is  that  the  database  is  an  integrated  resource  to 
be  used  by  all  members  of  the  organizations  who  need  infor- 
mation contained  in  it. 

Since  the  database  is  an  integrated  and  shared  resource 
for  multiple  users  within  an  organization,  it  should  be 
managed  for  the  orgarization* s  benefit  and  from  its  view- 
point, not  by  individual  users.  Thus,  two  additional 
concepts  have  been  developed  to  solve  the  problem  of 
controlling  and  managing  the  organization' s  database 
resource.  Initially,  software  was  developed  to  provide  a 
common  interface  between  all  users  and  the  integrated  data- 
base. A  common  interface  promotes  privacy  and  data  integ- 
rity. Also,  users  cannot  store  information  implicitly  and 
must  use  and  modify  data  in  a  manner  consistent  with  the 
organization's  viewpoint.  The  software,  known  as  a  database 
management  system,  allows  computer  control  of  the  data 
resource.  A  database  management  system  (DBMS)  is  a  collec- 
tion of  software  tools  and  slccgss  methods  which  enables  the 
users  to  store  facts  about  real-world  objects  and  the  rela- 
tionships between  these  objects,  and  to  manipulate  those 
facts  by  issuing  queries  in  content-addressable  form.  In 
short,  a  DBMS  is  a  generalized  tool  for  manipulating  a  data- 
base [Ref.  5:pp.  3-17]  ;  it  is  made  available  through 
special  software  for  the  interrogation,  maintenance,  and 
analysis  of  data. 

The  second  concept  is  that  of  the  database  administrator 

(DBA)  .  The  DBA  can  be  thought  of  as  one  or  more  individuals, 

possibly  aided   by  a  staff,    who  manage   the  organization's 
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database  resource  [Bef.  5:pp.  3-17],  or  are  responsible  for 
overall  control  of  the  database  system.  The  DBA ' s  responsi- 
bilities include  the  following  [Eef.  3:pp.  25-26]. 

.Deciding  the  information  content  of  the  database. 

.Deciding  the  storage  structure  and  access  strategy. 

.Liaising  with  users. 

.Defining  authorization  checks  and  validation  procedures. 

.Defining  a  strategy  for  backup  and  recovery. 

.Monitoring    performance    and  responding  to  changes  in 
reguirements. 

We  can  clearly  see  the  reason  why  an  organization  should 
choose  tc  store  its  operational  data  in  an  integrated  data- 
base. A  database  system  provides  the  organization  with 
centralized  control  of  its  operational  data  which  is  one  of 
its  most  valuable  assets.  This  is  in  sharp  contrast  to  the 
situation  that  prevails  in  many  organizations  today,  where 
typically  each  application  has  its  own  private  files  so  that 
the  operational  data  is  widely  dispersed,  and  is  therefore 
probably  difficult  tc  control. 

E-   CCHBCHENTS  OF  A  lATABASE  SISTEM 

A  database  system  consists  of  four  major  components: 
data,  hardware,  software,  and  users.  Fig  2.1  shows  a  greatly 
simplified  view  of  the  major  components  of  a  database 
system. 

1 .   Dai§ 

The  data  stored  in  the  system  is  partitioned  into 
one  or  more  databases.  For  tutorial  purposes  it  is  usually 
convenient  to  assume  that  there  is  just  one  database, 
containing  the  totality  of  all  stored  data  in  the  system. 

According  to  standard  usage  in  the  computer 
industry,    bits    are   grouped   into   bytes  or  characters. 


16 


< > 


< > 


Application 
software 


DBMS 


C.S, 


Access  methods 


<— > 


< — > 


CZDI 


o 
/l\ 

I 
/\ 

o 

/)\ 


/  \ 

o 

/l\ 

i 
/  \ 


Figure  2. 1    Siaplified  View  of  a  Database  System. 

characters  are  grouped  into  fields,  and  fields  are  grouped 
into  records,  A  collection  of  records  is  called  a  file.  At 
this  point,  we  cannot  say  that  a  database  is  a  collection  of 
files.  A  database  is  a  collection  of  "integrated"  files  and 
relationships  among  records  in  those  files.  Database 
processing  differs  from  file  processing  in  which  the  struc- 
ture of  the  files  is  distributed  across  the  application 
programs  and  each  file  is  considered  to  exist  independently. 
Cn  the  other  hand,  the  database  is  self-describing  because 
it  contains,  within  itself,  a  description  of  its  structure. 
Another  difference  between  file  processing  and  database 
concerns  the  term  file.  For  file  processing,  the  records  in 
a  file  are  usually  grouped  together  physically.  For  database 
processing,  the  logical  collection  of  records  probably  does 
not  exist  as  a  physical  collection.  In  database  processing, 
there  are  logical  files,  or  collections  of  records  having 
meaning  to  users,  acd  physical  files,  or  collections  of 
records  cn  physical  devices. 


17 


2.   Hardware 

In  general,  database  applications  do  not  require 
special  hardware.  It  consists  of  direct  access  storage  (or 
secondary  storage)  devices  (disks,  drums,  etc.)  on  which  the 
database  resides,  together  with  the  associated  devices, 
control  units,  channels,  and  so  forth.  It  is  assumed  that 
the  database  is  toe  large  to  be  stored  in  its  entirety 
within  the  computer's  primary  storage. 

In  1982,  a  new  term  appeared  and  several  vendors 
announced  new  products  called  "database  machines"  [Ref.  6:p. 
8];  These  machines  are  special  purpose  computers  that 
perform  database  processing  functions.  According  to  this 
type  cf  architecture,  the  main  frame  or  host  computer  sends 
requests  for  service  and  data  over  a  channel  to  the  database 
machine.  The  machine  processes  the  requests  and  sends 
results,  messages,  or  data  back  to  the  main  computer.  Thus 
database  processing  can  be  performed  simultaneously  with 
applications  processing.  The  actual  effectiveness  of  such 
machines  is  under  investigation.  If  substantial  processing 
efficiencies  can  be  proved  at  a  reasonable  cost,  then  data- 
base machines  will  become  important.  Hardware  aspects  of  the 
system  form  a  major  topic  in  their  own  right;  the  problems 
encountered  in  this  area  are  not  peculiar  to  database 
systems,  and  those  picblems  have  been  very  thoroughly  inves- 
tigated and  documented  elsewhere.  Thus,  this  thesis  is  net 
concerned  with  hardware  aspects  of  the  system. 

3  -   Software 

The  database  management  system  or  DBi^S  is  a  layer  of 
software  which  provides  the  interface  between  the  physical 
database  itself  (i.e.,  the  data  as  actually  stored)  and  the 
users  of  the  system.  All  requests  from  users  for  access  to 
the  database  are  handled  by   the  DBilS.   One  general  function 
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provided  by  the  DBMS  is  the  separation  of  database  users 
from  hardware- level  detail.  In  other  words,  the  DBMS 
provides  a  view  of  the  database  that  is  elevated  somewhat 
above  the  hardware  level,  and  supports  user  operations  (such 
as  "get  the  OFFICER  record  for  officer  Buyukoner")  that  are 
expressed  in  terms  of  that  higher-level  view.  This  function, 
cind  other  functions  of  the  DBMS,  will  be  discussed  in  detail 
later - 

Two  types  of  programs  involved  in  database 
processing  are  the  Operating  System  (OS)  and  Communications 
Control  Program  (CCP).  The  operating  system  is  a  set  of 
programs  which  controls  the  computer's  resources.  In  a 
sense,  the  OS  can  be  viewed  as  the  glue  that  holds  all  of 
the  other  programs  together.  Communications  control  program 
(CCP)  performs  communications-oriented  tasks.  On-line 
processing  requests  cr  transactions  are  provided  by  users  at 
terminals.  The  requests  are  received  and  routed  by  the  CCP 
over  communications  lines.  The  CCP  has  several  important 
functions:  provides  communications  error  detection  and 
correction,  manages  terminal  activity,  routes  messages  to 
the  correct  next  destination,  and  formats  messages  for 
various  types  of  terminal  equipment.  The  CCP  also  routes 
on-line  input  to  the  next  level  of  programs  which  contains 
application  programs  and  database  utilities.  The  operating 
system  and  the  CCP  will  not  be  discussed  further  in  this 
thesis. 

U,   Users 

There  are  three  broad  classes  of  user  being  consid- 
ered: application  programmers,  end-users,  and  the  database 
administrator     (DBA).      [Ref,    3:p,    6] 

The  application  programmer  is  responsible  for 
writing  application  programs  that  use  the  database,  typi- 
cally  in   a    high-level  language    such    as   COBOL    or    PL/I.       These 
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application  programs  are  used  with  the  data  for  retrieving 
information,  creating  new  information,  and  deleting  or 
changing  existing  information.  The  programs  themselves  may 
be  conventional  batch  applications,  or  they  may  be  "on-line" 
programs  that  are  designed  to  support  an  end-user  inter- 
acting with  the  system  from  an  on-line  terminal. 

The  end-user  can  access  the  database  from  a 
terminal.  An  end-user  may,  in  general,  perform  all  the  func- 
tions of  retrieval,  creation,  deletion,  and  modificaticn  by 
employing  a  query  language  provided  as  an  integral  part  of 
the  system,  or  by  invoking  a  user-written  application 
program  that  accepts  commands  from  the  terminal  and  in  turn 
issues  requests  to  the  DBMS  on  the  end-user's  behalf. 

Ihe  database  administrator,  or  DBA  mentioned  earlier 
in  this  Chapter,  is  the  person  (or  group  of  persons)  respon- 
sible for  overall  control  of  the  database  system.  The  func- 
tion of  the  DBA  staff  is  to  serve  as  a  protector  of  the 
database  and  as  a  focal  point  for  resolving  users' 
conflicts. 

C.   ADVAHTAGES  AND  DISADVANTAGES  OF  DATABASE  PBOCESSING 

The  main  advantage  of  database  processing  is  included  in 
its  definition  given  previously.  Integrated  and  shared  data 
offers  those  important  advantages.  Database  processing 
allows  mere  information  to  be  produced  from  a  given  amount 
of  data.  Secondly,  the  amount  of  redundancy  in  stored  data 
can  be  minimized.  In  other  words,  the  elimination  or  reduc- 
tion of  data  duplication  allows  data  to  only  be  stored  cnce. 
As  a  result,  this  saves  file  space,  and  to  some  extent,  can 
reduce  processing  requirements.  [Ref.  6:pp.  3-8],  and 
[Ref.  7:pp.  1-16] 

As  mentioned  earlier,  centralized  control  of  the  opera- 
tional data  in  a  database  provides  the  following  advantages 
[Ref.  3:pp.  10-12]. 
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1  -   Avoidance  of  laconsistepcY 

This  is  reallj  a  corollary  of  the  above  point.  If  a 
given  fact  about  the  real  world  is  represented  by  two 
different  entries  in  the  database  and  the  redundancy  is  not 
controlled,  then  there  will  be  some  occasions  for  which  the 
two  entries  will  not  agree  (that  is,  when  only  one  has  been 
updated) .  At  such  times  the  database  is  said  to  be  inconsis- 
tent. In  this  case,  the  database  produces  incorrect  or 
conflicting  inf ormaticn.  If  the  redundancy  is  controlled, 
then  the  system  could  guarantee  that  the  database  is  never 
inconsistent  as  seen  by  the  user,  by  ensuring  that  any 
change  made  to  either  of  the  two  entries  is  automatically 
made  to  the  other.  This  process  is  known  as  propagating 
updates  (the  term  "update"  is  used  to  cover  all  the  opera- 
tions of  creation,  deletion,  and  modification) . 

2-  Shared  Data 

The  concept  of  shared  data  was  discussed  in  Section 
A. 

3-  Enforcement  cf  Standards 

The  applicable  standards,  which  may  include  any  or 
all  of  the  following:  installation,  company,  industry,  and 
national  standards,  are  followed  in  the  representation  of 
the  data.  Standardizing  stored  data  formats  assists  in  data 
interchange  or  migration  between  systems. 

'^ '      Application  cf  Security  Bestrictions 

The  DBA   can   define   authorization   checks   tc   be 
carried  out  whenever   access  to  sensitive  data   is  attempted 
(see  Chapter  X  for  more  detail) . 
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5-   Mainte'napce  of  Data 

Ihe  prohlem  of  integrity  is  the  problem  of  ensuring 
that  the  data  in  the  database  is  accurate.  Inconsistency 
between  tvo  entries  representing  the  same  "fact"  leads  to  a 
lack  of  data  integrity  (which  can  occur  only  if  redundancy 
exists  in  the  stored  data).  It  is  essential  to  point  out 
that  data  integrity  is  even  more  important  in  a  database 
system  than  in  a  "private  files"  environment^  because  the 
database  is  shared.  Centralized  control  of  the  database 
supports  data  integrity, 

6.   Balancing  of  Conflicting  Requirements 

Knowing  the  overall  reguirements  of  the  enterprise, 
the  EBA  can  structure  the  database  system  to  provide  an 
overall  service  that  is  "best  for  the  enterprise." 

The  cost  of  database  processing  may  become  a  major 
disadvantage.  It  can  be  expensive.  The  DBMS  may  need  so  much 
primary  storage  that  additional  storage  must  be  purchased. 
Even  with  more  storage,  it  may  get  exclusive  control  of  the 
CPa,  thus  forcing  the  user  to  upgrade  to  a  more  powerful 
computer.   [Ref.  6:pp-  3-8] 

Once  the  database  is  implemented,  operating  costs 
for  some  systems  will  be  higher.  For  example,  sequential 
processing  will  never  be  done  as  fast  in  the  database  envi- 
ronment, since  it  causes  excessive  overhead. 

large  amounts  of  data  in  different  formats  can  be 
interrelated  in  the  database.  Both  the  database  system  and 
the  application  programs  must  be  able  to  process  these 
structures.  This  requires  more  sophisticated  programming, 
takes  time,  and  requires  highly  skilled  programming 
personnel.  Thus,  the  complexity  is  another  important  disad- 
vantage of  database  processing.  Backup  and  recovery  also 
increases  complexity  and  are  more   difficult  in  the  database 
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enviromcent  to  carry  out.  Another  reason  for  this  is  that 
databases  are  often  processed  by  multiple  users  concur- 
rently. Determining  the  exact  state  of  the  database  at  the 
time  of  failure  may  be  a  problem.  Given  that,  it  may  be  even 
more  difficult  to  determine  what  should  be  done  next. 

Another  disadvantage  is  that  integration,  and  hence 
centralization,  increases  vulnerability.  A  failure  in  one 
component  of  an  integrated  system  can  cause  the  entire 
system  to  fail.  This  event  is  especially  critical  if  the 
operation  of  the  user  organization  depends  on  the  database. 

To  avoid  these  potential  drawbacks  a  database 
management  system  (DBMS)  should  satisfy  the  following  objec- 
tives [Hef.  7:pp.  13-14]  : 

.Different    functions   of   an  enterprise  can  be  served 

effectively  by  the  same  DBMS. 
.Redundancy  in  stored  data  can  be  minimized. 
.Consistent  information  can  be  supplied  for  the  decision- 
making process. 
.Security  controls  can  be  applied. 

.Application   programs  can   be  developed,  maintained,  and 
enhanced   faster  and   more   economically,   with   fewer 
skilled  personnel. 
.Physical  reorganization  of  the  stored  data  is  easy. 
.Centralized  conticl  of  the  database  is  possible. 
.Easier   procedures   for   computer   operations   can  be 
established. 


D.   AN  ABCHITECTORE  FCB  A  DATABASE  SYSTEM 

An  architecture  fcr  a  database  system  is  illustrated  in 
Fig.  2.2  [Eef.  3:p,  20]-  This  picture  presents  a  framework 
which  is  extremely  useful  for  describing  general  database 
concepts  and  for  explaining  the  structure  of  individual 
systems,   and  it  is  ir  broad  agreement  with  that  proposed  by 
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the  ANSI/SPARC  Study  Group  on  Data  Base   Management  Systems 
[Bef.  8]. 

Ihe  architecture  is  divided  into  three  general  levels: 
internal,  conceptual,  and  external.  Generally  speaking,  the 
external  level  is  the  one  closest  to  the  users;  that  is,  the 
one  concerned  with  the  way  in  which  the  data  is  viewed  by 
individual  users.  The  internal  level  is  the  level  closest  to 
physical  storage;  that  is,  the  one  concerned  with  the  way  in 
which  the  data  are  actually  stored.  The  conceptual  level  is 
a  bridge  or  "level  of  indirection"  between  the  ether  two. 
There  may  be  many  "external  views,"  each  consisting  of  a 
more  or  less  user  oriented  logical  representation  of  seme 
portion  of  the  database  (such  as  logical  records  and 
fields),  and  there  may  be  a  single  "conceptual  view," 
consisting  of  a  similarly  logical  representation  of  the 
entire  database.  Likewise,  there  will  be  a  single  "internal 
view,"  representing  the  total  database  as  actually  stored. 

The  three  levels  are  also  defined  as  levels  of  abstrac- 
tion and  named  in  the  specification  of  a  database  structure: 
the  conceptual  or  enterprise  administrator  view,  the  iitple- 
mentation  view  of  the  applications  programmer  or  end  user, 
and  the  physical  view  of  the  systems  programmer/analyst 
[Eef.  5:pp.  3-17].  The  external  level,  conceptual  level, 
and  internal  level  in  the  ANSI/SPARC  model  correspond  tc  the 
implementation  level,  conceptual  level,  and  physical  level 
in  the  levels  of  abstraction,  respectively.  Figure  2.3 
shows  these  three  levels  of  abstraction  and  some  of  their 
primary  components. 

It  should  be  obvious  that  between  the  computer,  dealing 
with  bits,  and  the  ultimate  user,  dealing  with  abstractions 
such  as  military  units  or  assignment  of  personnel  to  a  divi- 
sion, there  will  be  many  levels  of  abstraction.  It  should  be 
emphasized  that  only  the  database  actually  exists  at  the 
physical  level.   We  may  view  the  physical  database  itself  at 
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Figure  2.2    Database  System  Architecture. 

several  levels  of  abstraction,  ranging  from  that  cf  records 
and  files  in  a  programming  language  such  as  Pascal,  through 
the  level  of  logical  lecords,  as  supported  by  the  operating 
system  underlying  the  DBMS,  down  to  the  level  of  tits  and 
physical  addresses  on  storage  devices.  We  may  also  vievi  the 
conceptual  database  as  an  abstraction  of  the  real  world 
pertinent  to  an  enteifrise.    J.D.   Ullman  [Ref.  9:pp.   5-9] 
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CONCEPTUAL  LEVEL « ^   Enterprise  administrator  view 

.Entities 

.Attributes 

•Relationships 

IMPLEMENTATION  LEVEL  c;:::;^^'^   Applications  programmer  or 

^ — ^ena-user  view 
.Records 
.Data  items 
.Interrecord  relationships 

PHYSICAL  LEVEL  *-^.-.>.,."     Systems  programmer/analyst 

-——.....^nd  physical  device  view 
.Blocks 
.Pointers 
.Overhead  data 
.Clusterings 


Figure  2.3   Levels  of  Abstraction  in  a  Database  System. 

emphasizes  that  a  view  (or  external  view)  is  an  abstract 
model  of  a  portion  of  the  conceptual  database.  As  an  example 
of  the  utility  of  views,  the  army  may  provide  a  computerized 
personnel  assignment  department,  consisting  of  data  and  a 
collection  of  programs  that  deal  with  officers  and  military 
units.  These  programs,  and  the  people  who  use  them,  dc  not 
require  knowledge  concerning  personnel  files  or  the  assign- 
ment cf  officers  to  units.  The  personnel  department  may 
need  to  know  about  assignments,  units,  and  aspects  of  the 
personnel  files  (e.g.,  which  officers  are  qualified  to 
assign  to  unit  X) ,  but  does  not  need  to  know  about  personnel 
salaries.  Thus,  there  may  be  one  view  of  the  database  for 
the  personnel  department  and  another  for  the  finance 
department. 

"In  a  sense,  a  view  is  just  a  small  conceptual  database, 
and  it  is  at  the  same  level  of  abstraction  as  the 
conceptual  database.  However,  there  are  senses  in  which 
a  view  can  be  "more  abstract"  than  a  conceptual  data- 
base, as  the  data  dealt  with  by  a  view  may  be  construc- 
taile  from  the  conceptual  database  but  not  actually 
present  in  that  database."   [Ref.  9:p.  7] 
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Pigure  2.2  illustrates  the  several  components  of  the 
architecture  in  more  detail.  Next,  we  will  examine  these 
components. 

Each  user  has  a  language  at  his  or  her  disposal.  For  the 
application  programmer  it  will  be  a  high-level  programming 
language,  such  as  PI/I  or  COBOL;  for  the  terminal  user  it 
will  be  either  a  guery  language  or  a  special-purpose 
language  supported  by  an  on-line  application  program  to  meet 
the  user's  requirements.  Each  of  those  languages  is  known  as 
"host  language."  The  term  "data  sublanguage  (DSL)"  is  a 
subset  of  the  host  language  that  is  concerned  with  database 
objects  and  operations.  In  other  words,  the  DSL  is  embedded 
in  a  host  language.  Multiple  host  languages  and  multiple 
DSLs  may  be  supported  by  a  given  system.  In  principle,  any 
given  data  sublanguage  is  really  a  combination  of  two 
languages  [Ref.  3:pp«  17-25],  a  data  definition  language 
(DDL),  which  provides  for  the  definition  or  description  of 
database  objects,  and  a  data  manipulation  language  (DHL) , 
■which  supports  the  manipulation  or  processing  of  such 
objects.  In  most  systems  today  the  data  sublanguage  and  the 
host  are  very  loosely  coupled.  That  is,  the  definitions 
writtex  in  DDL  are  completely  outside  the  application 
program. 

An  external  view  is  the  content  of  the  database  as  it  is 
seen  ty  some  particular  user.  In  general,  an  external  view 
consists  of  multiple  occurrences  of  multiple  types  of 
external  records  [Bef-  3:pp.  17-25].  An  external  record 
refers  to  a  "logical  record"  which  is  not  necessarily  the 
same  as  a  stored  record  (see  Section  E  of  this  Chapter) . 

Each  external  view  is  defined  by  means  of  an  external 
schema,  which  is  made  up  of  definitions  of  each  of  the 
different  types  of  external  records  in  that  view.  The  term 
"view"  is  used  for  a  set  of  record  occurrences  and  the  term 
"schema"  is  used   for  the  definition  of  that   view.   The  DDL 
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portion  of  the  data  sublanguage  is  used  to  vrite  the 
external  schema.  That  DDL  is  sometimes  called  an  external 
EDL. 

The  conceptual  view  is  a  representation  of  the  entire 
information  content  of  the  database  in  a  form  that  is  some- 
what abstract  in  comjarison  with  the  way  in  which  the  data 
are  physically  stored.  The  conceptual  view  is  composed  of 
multiple  occurrences  of  multiple  types  of  conceptual 
records.  It  is  more  desirable  to  consider  "entities,"  and 
"relationships"  instead  of  dealing  in  terms  of  "conceptual 
records."  A  conceptual  record  is  not  the  same  as  either  an 
external  record  or  a  stored  record.  For  example,  the  concep- 
tual view  may  consist  of  a  collection  of  branch  record 
occurrences  plus  a  collection  of  military  personnel  record 
occurrences  plus  a  collection  of  course  record  occurrences, 
and  so  on.  The  conceptual  view  is  defined  by  means  cf  the 
conceptual  schema,  which  includes  definitions  of  each  of  the 
several  types  of  conceptual  records.  The  conceptual  view  is 
a  view  of  the  total  database  content,  and  the  conceptual 
schema  is  a  definition  of  this  view.  The  conceptual  schema 
is  written  using  another  DDL  called  conceptual  DDL-  It  is 
intended  that  the  definitions  in  the  conceptual  schema 
include  many  additional  features,  such  as  the  authorization 
checks  and  validation  procedures,  and  these  definitions  must 
not  involve  any  considerations  of  storage  structure  or 
access  strategy.  In  other  words,  there  must  not  be  any 
reference  to  stored  field  representations,  physical 
sequence,  hash-addressing,  indexing,  or  any  other  storage/ 
access  details.  At  this  level,  the  situation  allows  the 
conceptual  model  to  be  "data  independent"  which  will  be 
discussed  in  the  next  Section. 

Some  authorities  would  suggest  that  the  fundamental 
objective  of  the  conceptual  schema  is  to  describe  the  entire 
enterprise;  not  just  its  operational  data,  but  also  how  that 
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data  is  used,  how  the  data  flows  within  the  enterprise,  what 
the  data  is  used  for  at  each  point,  what  audit  controls  are 
to  he  applied  at  each  point,  and  so  on. 

The  internal  view  or  physical  level  of  the  architecture 
is  a  very  low-level  representation  of  the  entire  dataiase. 
It  consists  of  multiple  occurrences  of  multiple  types  of 
stored  records  (the  ANSI/SPAEC  refers  to  this  terir  as 
"internal  record") .  The  internal  view  is  defined  by  means  of 
the  internal  schema,  which  not  only  defines  the  various 
types  of  stored  records  but  also  specifies  what  indexes 
ezist,  how  stored  fields  are  represented,  what  physical 
sequence  the  stored  records  are  in,  and  so  on  [Bef.  3:pp. 
17-25].  Another  data  definition  language  (the  internal  DDL) 
is  used  to  write  the  internal  schema.  It  is  convenient  to 
use  the  term  "stored  database"  in  place  of  "internal  view," 
and  "storage  structure  definition"  in  place  of  "internal 
schema." 

Two  levels  of  napping  are  shown  in  Fig.  2.2  The 
conceptual/internal  mapping  describes  the  correspondence 
between  the  conceptual  view  (or  data  model)  and  the  stored 
database;  it  specifies  how  conceptual  records  and  fields  map 
into  their  stored  counterparts.  If  the  structure  of  the 
stored  database  is  changed,  the  conceptual/internal  mapping 
must  be  changed  accordingly,  so  that  the  conceptual  schema 
may  remain  invariant.  Por  example,  if  a  change  is  made  to 
the  storage  structure  definition  of  the  database,  the 
effects  of  such  a  change  must  be  contained  below  the  concep- 
tual level,  so  that  "data  independence"  can  be  accomplished. 

Ad  external/conceptual  mapping  describes  the  correspon- 
dence between  a  specific  external  view  and  the  conceptual 
view.  In  general,  the  same  kind  of  differences  may  exist 
between  these  two  levels  as  may  exist  between  the  conceptual 
view  and  the  stored  database.  Por  example,  records  may  be  in 
different  sequences,   fields  may   have  different  data  types. 
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and  so  on.  Different  external  views  may  overlap.  That  is, 
any  ^lumter  of  external  views  may  exist  at  the  same  time  and 
any  number  of  users  nay  share  a  given  external  view.  Some 
systems  allow  the  definition  of  one  external  view  tc  be 
expressed  in  terms  of  others  without  always  requiring  an 
explicit  definition  of  the  mapping  to  the  conceptual  level. 
If  various  external  views  are  strictly  related  to  one 
another,  this  will  be  a  very  useful  feature  of  the  system. 

Referring  again  to  Fig.  2.2,  there  still  remain  three 
components  of  the  architecture:  the  database  management 
system  (EBMS) ,  the  database  administrator  (DBA) ,  and  the 
user  interface.  The  DBMS  is  the  software  that  handles  all 
access  to  the  database.  The  basic  steps  that  occur  in  a  DBMS 
are  the  following  [Bef.  3:pp.  17-25]  : 

1.  A  user  issues  an  access  request,  using  some  partic- 
ular data  manipulation  language, 

2.  the  DBMS  intercepts  the  request  and  interprets  it, 

3.  the  DBMS  inspects,  in  turn,  the  external  schema,  the 
external/conceptual  mapping,  the  conceptual  schema, 
the  conceptual/internal  mapping,  and  the  storage 
structure  definition,  and 

4.  the  DBMS  performs  the  necessary  operations  en  the 
stored  database. 

For  example,  assume  that  a  user  wishes  to  retrieve  a 
particular  external  record  occurrence.  In  general,  the  DBMS 
must  retrieve  all  required  stored  record  occurrences, 
construct  the  required  conceptual  record  occurrences,  and 
then  construct  the  required  external  record  occurrence.  At 
each  step,  data  type  or  other  conversions  may  be  necessary. 
Tlhenever  a  retrieval  request  occurs,  fields  will  be  required 
from  several  conceptual  record  occurrences,  and  each  concep- 
tual record  occurrence,  in  turn,  may  require  fields  from 
several  stored  record  occurrences. 
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The  database  admiDistrator  (DBA)  ,  previously  discussed 
to  some  extent,  controls  the  overall  database  system.  Ke 
will  only  mention  the  utilities  and  tools  which  are  required 
to  achieve  the  DBA»£  tasks.  Such  utilities  would  te  an 
essential  part  of  a  database  system.  For  instance,  loading 
routines,  reorganization  routines,  journaling  routines, 
recovery  routines,  and  statistical  analysis  routines  may  be 
included  as  utilities.  One  of  the  most  important  DEA  tools 
is  the  "data  dictionary"  (not  shown  in  Fig.  2.2).  The  data 
dictionary  is  effectively  a  database  in  its  own  right  (that 
is,  descriptions  of  other  objects  in  the  system) .  In  partic- 
ular, all  the  various  schemas  (external,  conceptual, 
internal)  are  physically  stored  in  both  source  and  otject 
form  in  the  dictionary.  A  comprehensive  dictionary  will  also 
include  cross-reference  information,  showing,  for  example, 
which  programs  use  which  pieces  of  data,  which  departments 
require  which  reports,  and  so  on.  It  is  possible  to  query 
the  dictionary  just  like  any  other  database,  so  that  the  DBA 
can  easily  discover  which  programs  are  likely  to  be  affected 
by  some  change  to  the  system. 

A  data  dictionary  should  help  a  database  user  in  the 
following  ways:   [Bef.  7:pp.  20-21] 

.Communicating  with  the  other  users. 

.Controlling   the  data  elements  in  a  simple  and  effective 

manner,   that  is,   introducing   new   elements   into   the 

systems,  or  changing  the  definitions  of  the  elements. 
•Seducing  data  redundancy  and  inconsistency. 
.Determining   the  impact   of  changes  to   data  elements  on 

the  total  database, 
.Centralizing  the  control  of  the  data   elements  as  an  aid 

in  database  design  and  in  expanding  the  design. 

The  user  interface,  shown  in  Fig-  2.2,  may  be  defined  as 
a  boundary  in  the  system  below  which  everything  is  trans- 
parent (invisible)  to  the  user.  Thus,  the  user  interface  is 
at  the  external  level. 
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E.   DATA  INDEPENDENCE 

The  concept  of  data  independence  may  be  easily  under- 
stood by  first  introducing  its  opposite.  Currently,  many 
applications  are  data-dependent.  This  means  that  the 
requirements  of  the  application  dictate  both  the  way  in 
which  the  data  are  organized  in  secondary  storage  and  the 
way  in  which  they  are  accessed,  and,  moreover,  that  knowl- 
edge of  the  data  structure  and  access  method  is  built  into 
the  application  logic.  In  this  case,  the  application 
programmer  has  to  knew  the  data  format,  the  location  of 
where  the  data  is  stored,  and  the  a.ccess  method  which  tells 
how  the  data  is  accessed.  Changes  in  any  of  these  items  may 
affect  the  application  program  and  result  in  other  changes, 
since  the  details  of  these  three  points  may  be  embedded  into 
the  application  code.  It  is  also  likely  that  as  the  needs  of 
the  enterprise  change,  the  format  of  the  data  may  change, 
and  the  data  set  has  to  be  expanded  by  adding  information 
about  different  types  of  entities  or  additional  information 
about  existing  entiti€s. 

It  is  said  that  an  application  such  as  above  is  data- 
dependent  because  it  is  impossible  to  change  the  storage 
structure  (how  the  data  is  physically  stored)  or  the  access 
method  without  affecting  the  application.  For  example,  it 
would  not  be  possible  to  replace  an  indexed  sequential  file 
by  a  hash-addressed  file  without  making  any  changes  tc  the 
appl ication. 

In  a  database  system,  there  are  at  least  two  important 
reasons  why  applications  must  be  data-independent 
[Ref.  3:pp.  12-17]. 

1.  Different  applications   will  need  different   views  of 
the  same  data. 

2.  The  DBA  must   have  the  freedom  to   change  the  storage 
structure  or  access  strategy  (or  both)  in  response  to 
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changing  requirements  without  having  to  mcdify 
existing  applications.  For  example,  the  enterprise 
may  adopt  new  standards,  application  priorities  may 
change,  new  types  of  storage  device  may  become  avail- 
atle,  and  so  en. 
If  applications  are  data-dependent,  such  changes  involve 

corresponding  changes  to  programs,   requiring  programmers  to 

spend   an  increasing   percentage  of   their   time  in   program 

maintenance  and  updating. 

Therefore,   it   is  obvious   that  the   provision  of   data 

independence  is  a   major  objective  of  database   systems.   S. 

Atre  [Eef.  7:p.  17]  defines  data  independence  as 

"The  ability   to  use   the  database   without  knowing   the 
representation  details." 

It  can  also  be  defined  as  the  immunity  of  applications  to 
change  in  storage  structure  and  access  strategy  which 
implies  that  the  applications  concerned  do  not  depend  on  any 
one  particular  storage  structure  and  access  strategy.  In 
Section  E,  we  have  presented  an  architecture  for  a  database 
system  that  provides  a  fundamental  principle  for  achieving 
this  objective. 

Data  independence  provides,  at  a  central  location,  a 
solution  to  the  problems  discussed  above.  The  individual 
application  programmer  is  not  required  to  change  the  appli- 
cation programs  to  accommodate  changes  in  access  method  or 
location  or  format  of  the  data.  Unfortunately,  it  is  diffi- 
cult to  achieve  full  data  independence  in  a  database  system, 
since  a  database  design  depends  on  the  availability  of  the 
DBMS  software  packages  today,  even  with  the  best  database 
design.  The  central  location  for  reflecting  changes  in  the 
storage  structure  and  the  access  strategy  should  be  anchored 
in  the  DBMS.  The  important  point  here  is  when,  where,  why, 
and  who   should  specify   the  changes  to   the  DBMS,    and  hho 
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should  ccntrol  these  changes?   The  DBA  should  ,of  course,  be 
given  these  responsibilities. 

Th€  reasons  for  data  independence  are  summarized  as 
follovis: 

"1.   To  allow  the  DBA  to  make  changes  in  the  content, 

location,   representation  and  organization  of  a  database 

without  causing  reprogramming   of  application  programs 
which  use  the  database, 

2.  To  allow  the  supplier  of  data  processing  equip- 
ment and  software  to  introduce  new  technologies  without 
causing  reprogramming  of  the  customer's  application. 

3.  To  facilitate  data  sharing  by  allowing  the  same 
data  tc  appear  to  be  organized  differently  for  different 
application  programs. 

4.  To  simplify  application  program  development  and, 
in  particular,  to  facilitate  the  development  of  programs 
for  interactive  database  processing. 

5-  To  provide  the  centralization  of  control  needed 
by  the  CEA  to  insure  the  security  and  integrity  of  the 
database."   [Eef.  7:pp.  17-18] 

The  levels  of  abstraction,  mentioned  in  Section  D  above, 
from  the  external  view  to  conceptual  to  internal  view, 
provides  two  stages  of  "data  independence."  In  a  well- 
designed  database  system,  the  internal  schema  can  be  modi- 
fied by  the  DBA  without  altering  the  conceptual  schema  or 
requiring  a  redeficition  of  the  external  schemas  (or 
subschemas) .  This  independence  is  known  as  physical  data 
independence.  The  advartage  of  physical  data  independence  is 
that  it  permits  "tuning"  of  the  internal  schema  for  effi- 
ciency while  allowing  application  programs  to  run  as  if  no 
change  had  occurred  £Eef.  9:pp.  5-9  ]« 

The  relationship  tetween  external  views  and  the  concep- 
tual view  also  gives  a  type  of  independence  called  logical 
data  independence.  Hany  changes  to  the  conceptual  schema  can 
be  made  without  affecting  existing  external  schemas,  and 
other  changes  to  the  conceptual  schema  can  be  made  if  the 
external/conceptual  mapping  is  redefined  by  the  DBA,  Again, 
no  change  to  the  application  programs  is  necessary 
[Ref.  9:pp.  5-9]. 
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In  order  to  use  standard  terms   as  much  as  possible,   it 
is  essential  to  give  some  definitions  for  a  database  system. 

"A  stored   field  is   the  smallest   named  unit  of  data 

stored   in  the   database.    The   database,   in  general, 

contains   many   occurrences   or  instances  of  each   of 
several  types  of  stored  fields. 

A  stored  record  is  a  named  collection  of  associated 
stored  fields.  A  stored  record  occurrence  or  instance 
consists  of  a  group  of  related  stored  field  occurrences 
(and  represents  an  association  between  them)  .  In  most 
systems.  the  stored  record  occurrence  is  the  unit  of 
access  to  the  database. 

A  stored  file  is  the  (named)  collection  of  all  occur- 
rences of  one  type  of  stored  record."   [Eef.  3: p.  14] 

-  We  conclude  this   Chapter  by  pointing  out   that  the  DBMS 
can  provide  independence  from: 

.underlying   representations   such  as   representation  of 
numeric  data,   representation   of   character  data,  data 
encoding/decoding,  and  units  for  numeric  data. 
.data   structure   such   as   materialization   of   computed 
fields,  and  structure  of  records  and  files. 
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III.  AN  OVEBVIEW  OF  DATABASE  DESIGN 

A.   INTBODDCTION 

Designing  a  datalase  is  a  difficult, complex  and  time- 
consuming  process.  Unfortunately,  inadequate  datatases 
result  because  they  cannot  satisfy  the  present  or  future 
organizational  requirements. 

The  process  of  de-veloping  a  database  structure  from  user 
requirements  is  called  database  design.  Many  database 
designers  have  argued  that  there  are  at  least  two  separate 
steps  in  the  database  design  process:  the  design  of  a 
logical  database  Structure  which  is  processible  by  the  DBMS 
and  describes  the  user's  view  of  the  data,  and  the  selection 
of  a  physical  structure  that  includes  data  representation  or 
encoding,  access  methods,  and  physical  clustering  of  data, 
ether  than  the  logical/physical  description,  however,  the 
overall  structure  of  the  design  process  has  not  been  well 
defined,  and  even  the  logical/physical  boundary  has  been 
open  to  considerable  dispu-te. 

General  information  requirements  include  a  statement  of 
the  objectives  of  the  database  system,  definition  of  the 
data  elements  to  be  included  in  the  database,  and  a  descrip- 
tion of  data  element  usage  in  the  users'  organizations. 
These  requirements  are  not  tied  to  any  specific  application; 
therefore,  database  structure  design  based  on  such  require- 
ments is  considered  to  be  advantageous  for  long-term  data- 
bases that  must  be  adaptable  to  changing  applications. 

Processing  requirements  consist  of  three  distinguishable 
components;  specific  data  items  required  for  each  applica- 
tion, the  data  volume  (number  of  data  occurrences) ,  and 
processing  frequencies  in  terms  of   the  number  of  times  each 
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application  must  be  xun  per  unit  time.  DBMS  specifications 
and  the  operating  system/hardware  configuration  are  also 
used  hy  the  designer- 
Performance  measures  and  performance  constraints  should 
be  considered  by  the  designer.  Typical  constraints  include 
upper  bcunds  on  response  times  to  queries,  recovery  times 
from  system  crashes,  or  specific  data  needed  to  support 
certain  security  or  integrity  requirements. 

Two  major  results  of  the  database  design  process  are 
the  complete  database  structure  and  guidelines  for  applica- 
tion programmers  based  on  database  structure  and  processing 
requirements. 


B.   EAIAEASE  SYSTEB  LIFE  CYCLE 

The  database  system  life  cycle  is  a  convenient  and 
useful  framework  from  which  to  view  the  database  system  as 
it  evolves  over  time.  This  framework  provides  an  ordered 
background  to  the  functions  of  a  database  administrator  and 
is  divided  into  three  separate  phases:  analysis  and  design, 
database  operation,  and  reorganization.  These  three  phases 
are  composed  of  the  following  steps: 
.  Analysis  and  design  phase 

1.  Requirements  formulation  and  analysis 

2.  Conceptual  design 

3.  Implementation  design 
y.  Physical  design 

.  Database  implementation  and  operation  phase 

1.  Database  implementation 

2.  Operation  and  monitoring 

.    Reorganization    phase    (Modification  and   adaptation) 
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C.   AHAIISIS  AND  DESIGN  PHASE 

A  stepwise  design  methodology  for  database  designer  or 
database  administrator  will  be  explained  in  this  section. 
The  general  interconnections  between  steps  are  illustrated 
in  Figure  3.1  [Hef.  5:p-  26]. 

1 .   Bequirements  Formulaticn  and  AnalYgis 

Eequirements  formulation  and  analysis  constitute  the 
most  important  step  of  the  entire  database  design  process, 
since  most  subsequent  design  decisions  are  based  on  this 
step.  It  iSy however,  the  most  poorly  defined  and  time- 
consuiiing  step  of  the  entire  process. 

Contemporary  database  applications  are  very  broad 
and  very  sophisticated.  Many  diverse  applications  may  use 
the  same  integrated  database.  The  design  of  a  database  to 
support  all  the  applications  becomes  very  complex.  A  design, 
without  sufficient  information  to  support  the  analysis,  will 
not  be  valid. 

The  major  task  is  collecting  information  content  and 
processing  requirements  from  all  the  identified  and  poten- 
tial users  of  the  database.  Analysis  of  the  requirements 
ensures  the  consistency  of  users'  objectives  as  well  as  the 
consistency  of  their  views  of  the  organization's  information 
flow. 

This  activity  includes  the  establishment  of  orgaci- 
zational  objectives,  derivation  of  specific  database 
requirements  from  those  objectives  or  directly  from  manage- 
ment and  nonmanagement  personnel,  and  documentation  of  those 
requirements  in  a  form  that  is  aggreeable  to  both  end  users 
and  database  designers.  The  technique  used  is  personal 
interviews  with  various  levels  of  management  and  key 
employees  involved  in  the  processing  of  data  and  services  in 
the  organization-   £Bef.  5:p.  25]- 
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Figure  3.1    Basic  Database  Design  Steps. 
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There  is  a  need  for  corporate  requirements  analysis 
in  the  requirements  formulation  and  analysis  step.  Data 
items  and  their  relationships  must  be  defined  and  conflicts 
are  at  least  recognized,  if  not  resolved,  during  corporate 
requirements  analysis. 

Different  departments  use  different  names  for  the 
same  things,  and  the  same  names  for  different  things,  so 
that  a  preliminary  common  view  of  data  and  processes  must  be 
available  before  later  steps  can  provide  reliable  results. 
Such  a  common  view  can  be  derived  only  in  cooperation  with 
users.  However,  this  common  view  will  not  necessarily 
resemble  the  final  database  structure.  In  conclusion,  there 
are  two  design  constraints  for  this  step: 

.accurately  modeling  real  world  requirements 

.aggregating  individual  views. 

2 .   Conceptual  Design 

Conceptual  design  deals  with  information  independent 
of  any  actual  implementation  (i.e.  any  particular  hardware 
or  software  system) .  The  main  purpose  of  conceptual  design 
is  to  represent  information  in  a  form  that  is  comprehensible 
to  the  user  independent  of  system  specifics,  but  implemen- 
table  on  several  systems.  The  result  of  conceptual  design  is 
called  the  conceptual  schema  because  it  is  a  representation 
of  the  u£er*s  "world"  view  and  independent  of  any  DBMS  soft- 
ware or  hardware  considerations. 

This  step  results  in   a  high-level  representation  of 
diverse  users'  information   requirements  such  as   an  entity- 
relationship  {2-R)   diagram   or  a  Semantic  Data   Model  (SDM) 
application. 

In  most  representation  mechanisms,  the  users 
describe  their  information  needs  in  terms  of  entities, 
attributes,  and  relationships  (E-E  diagrams) ,  or  in  terms  of 
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records,  items,  and  sets  using  a  DBMS' s  data  description 
language  (DDL).  It  is  clear  that  a  great  deal  of  gererality 
and  potential  design  optimality  are  lost  when  the  user  is 
restricted  to  a  particular  low-level  data  description 
language  instead  of  a  higher-level  representation  mechanism 
to  specify  their  information  requirements.  Similarly,  the 
goal  of  the  Semantic  rata  Model  (SDM)  is  as  follows: 

"Our  goal  is  the  design  of  a  higher-level  database  model 
that  will  enable  the  database  designer  to  naturally  and 
directly  incorporate  more  of  the  semantics  of  a  database 
into  i%s  schema.  Such  a  semantics-based  database 
description  and  structuring  formalism  is  intended  to 
serve  as  a  natural  application  modeling  mechanism  to 
capture  and  express  the  structure  of  the  application 
environment  in  the  structure  of  the  database." 
[Hef.  10:p,  352] 

There  are  two  major  reasons  for  a  designer  to  use  a 
high  level  of  abstraction  in  the  design  process.  First, 
entities,  attributes,  and  relationships  are  not  always 
explicitly  distinguished  and  the  design  decisions  are  often 
fuzzy.  Second,  the  problem  of  consistency  checking  would  be 
simplified  if  a  common,  high-level  information  representa- 
tion for  conceptual  information  structures  could  be 
developed. 

J\s  an  example,  conceptual  design  can  be  done  by 
entity  modeling  which  is  the  representation  and  integration 
of  user  views  in  terms  of  entity  diagrams.  There  are  four 
basic  design  decisions  required  to  formulate  the  entity 
diagrams. 

1 .  Selection  of  entities 

2.  Selection  of  entity  attributes 

3.  Selection  of  key  attributes  for  entities 

4.  Selection  of  relationships  between  entities. 

3 .   Implementation  Design 

The  major  goal  of  the  implementation  design  step  is 
to  use   the  results   of  the  conceptual   design  step   and  the 
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processing  reguirements  as  input  to  create  a 
DBMS-processible  schema  as  output.  Refinements  to  the  data- 
base structure  that  occur  during  this  design  step  are  devel- 
oped from  the  viewpoint  of  satisfying  DBMS-dependent 
constraints  as  well  as  constraints  specified  in  the  user 
reguirements. 

Inplementaticn  design  contains  database  structure 
design  and  design  of  programs.  The  database  structure  is  a 
DBMS-processible  data  definition  or  schema,  usually 
expressed  in  a  data  definition  language.  If  there  are  phys- 
ical parameters  to  be  selected  in  a  data  definition 
language,  selection  of  appropriate  characteristics  are 
deferred  until  the  physical  step.  The  program  design  is 
related  to  the  development  of  structured  programs  using  the 
host  language  and  data  manipulation  language  of  the  DBMS. 
Conceptual  design  and  implementation  design  steps  are 
together  referred  to  as  logical  design  by  some  authors. 

^  •   Physical  Design 

Physical  datatase  design  is  .  the  process  of  devel- 
oping an  efficient,  implementable  physical  database  struc- 
ture from  a  given  logical  database  structure  that  has  been 
shown  to  satisfy  user  information  reguirements. 

Physical  datatase  structure  represents  stored  record 
format,  access  method,  and  device  allocations  for  a 
mult iple-record-type  database. 

Major  decision  classes  of  physical  design  are  : 
1.  Stored  record  format  design.  This  contains  all  forms 
of  data  representation  and  compression  in  stored 
records.  It  also  contains  record  partitioning.  Record 
partitioning  defines  an  allocation  of  individual  data 
items  to  separate  physical  devices  of  the  same  or 
different   type,   or   separate   extents   on  the   same 


42 


device,   so  that  the  total  cost  of  accessing  data  for 
a  given  set  of  user  applications  is  minimized. 

2.  Access  method  design.  An  access  method  provides 
storage  and  retreival  capabilities  for  data  stored 
on  physical  devices,  usually  secondary  storage- 
Storage  structure  and  search  mechanisms  are  two 
important  components  of  an  access  method.  Storage 
structure  defines  the  limits  of  possible  access  paths 
through  indexes  and  stored  records,  and  the  search 
mechanisms  define  which  paths  are  to  be  selected  for 
given  applications.  A  given  file  may  have  many  asso- 
ciated access  paths.  Physical  databases  may  require 
several  primary  access  paths.  Efficiency  considera- 
tion of  the  dominant  application  describes  the 
design  of  individual  files.  Access  time  can  be 
greatly  reduced  through  secondary  indexes,  but  at  the 
expense  of  increased  storage  space  overhead  and  index 
maintenance. 

3.  Stored  record  clustering.  The  physical  allocation  of 
stored  records  to  physical  extents  is  one  of  the  most 
important  design  decisions-  Record  clustering 
involves  the  allocation  of  records  of  different  types 
into  physical  clusters  to  take  advantage  of  physical 
seguentiality  whenever  possible.  Analysis  of  record 
clustering  must  take  access  path  configurations  into 
account  to  avoid  access-time  degradation  due  to  a  new 
placement  of  records.  Clustering  also  involves  block- 
size  selection  for  efficient  retreival.  Blocks  in  a 
given  clustered  extent  are  influenced  by  stored 
record-size  and  storage  characteristics  of  the  phys- 
ical devices- 
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IV.  DATABASE  MODELS 

A.   INTECDDCTION 

A  data  model  is  a  representation  of  data  and  their  rela- 
tionships which  describes  ideas  about  the  real  world.  Data 
models  have  been  used  to  represent  a  conceptual  view  and  an 
impleientation  view  of  data.  Therefore,  we  will  classify 
the  data  models  as  fellows: 

1.  Conceptual  data  models 
Semantic  Data  Model  (SDM) 
Entity-Relationship  (E-E)  model 

2.  Implementation  data  models 
Relational  data  model 
Hierarchical  data  model 
Network  data  model 

One  of  the  major  responsibilities  of  the  database  admin- 
istrator is  to  develop  a  conceptual  model  of  the  organiza- 
tion. The  conceptual  model  is  a  communications  tool  between 
the  various  users  of  data,  and  it  is  developed  without  any 
concern  for  physical  representation. 

The  conceptual  model  should  be  independent  of  a  database 
management  system.  The  conceptual  model  has  to  be  mapped  to 
the  iirplementation  model  used  as  the  underlying  structure 
for  a  DBMS.  The  commercial  DEMSs  available  today  are  based 
either  on  a  relational  data  model,  hierarchical  data  model, 
a  network  data  model,  or  a  combination  of  them.  It  is 
important  to  understand  that  the  DBMS  is  not  a  factor  in 
designing  a  conceptual  model,  but  designing  an  iirplementa- 
tion model  is  dependent  on  the  DBMS  to  be  used. 

In  reality,  the  lEMS  is  frequently  given,  and  the  data- 
base administrator  has   no   choice.     The  reason   for   this 
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situation  is  that  a  particular  computer  may  support  only  one 
or  t¥c  DBAS'.  On  contrast,  the  choice  of  the  DBMS  should  be 
made  after  the  conceptual  model  is  designed.  The  process  of 
mapping  from  the  conceptual  model  to  the  implementation 
model  should  be  examined  while  evaluating  different  DBMS 
packages.  At  that  time,  the  DBMS  should  be  a  dominant  factor 
when  selecting  the  ccmputer. 

E.   CCHCZPTOAL  DATA  HCDELS 

1 ,   Semantic  Dat a  Model  (S DM) 

Contemporary  DBMSs  are  based  on  database  models 
which  have  limited  capabilities  for  expressing  the  meaning 
of  a  database.  These  database  models  do  not  adequately 
relate  a  database  to  its  corresponding  application  environ- 
ment. Therefore,  a  database  model  is  needed  which  allows  us 
to  capture  much  more  of  the  meaning  of  a  database.  The 
semantic  database  model  is  a  higher-level  database  model  and 
it  is  designed  to  provide  features  for  the  natural  modeling 
of  database  application  environments.  The  semantic  data 
model  provides  a  precise  documentation  and  communication 
medium  for  database  users.  More  details  of  the  semantic  data 
model  will  be  given  in  Chapter  V. 

2-   The  Entity-Relationship  (E-R)  Model 

The  entity-relationship  model  is  a  conceptual  data 
model  and  is  based  on  the  view  that  the  real  world  consists 
of  entities  and  relationships  between  entities.  In  this 
model,  real  world  objects  and  their  characteristics  are 
represented  by  entities  and  their  attributes. 

An  entity  is  a  "thing"  which  may  be  distinctly  iden- 
tified; examples  are  records  of  officers,  units,  and  so 
forth.  Individual  entities  are  classified  into  entity  sets- 
that  is,   collections   of  entities  that  may   be  described  by 
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the  same  set  of  properties.  Entities  with  the  same  attri- 
butes fall  into  one  entity  set.  All  OFFICER  records  forn:  the 
officer  entity  set;  all  ONIT  records  form  the  unit  entity 
set.  A  relationship  set  is  an  association  between  two  or 
more  entity  sets.  The  relationship  has  its  own  data  (e.g., 
date  cf  assignment,  order  number  of  assignment) . 

Entities  and  relationships  can  be  represented 
diagrammatically  by  an  entity-relationship  (S-R)  diagram. 
Each  entity  set  is  represented  by  a  rectangular  hex,  and 
each  relationship  set  by  a  diamond-shaped  box  in  this 
diagram.  The  diamond-shaped  boxes  (relationship  sets)  are 
joined  to  the  rectangular  boxes  (entity  sets  of  entities 
which  participate  in  the  relationship) .  Figure  4. 1  presents 
a  diagram  that  shows  the  relationship  of  the  officer  and 
unit  ertity  sets- 
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Figure  U.I    E-R  Diagram  for  OFflCER/ONIT  Relationship. 
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b.  ONIT  Data 


OIFICER 
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UNIT 
ID 

DATE    OF 
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11009 
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1 
._  1 

10000 

35ET 

790109 

125779-1 

\ 

31052 

03CO 

790112 

363479-6 

31052 

3  5BT 

801220 

563480-7 

20112 

0  3CO 

81021  1 

258281-6 

10000 

85DI7 

830818 

745683-3 

32578 

3  5ET              , 

840830 

563484-1 

c-  Relationship  Data 


Figure  4.2    Three  Tables  of  Data  for  the  E-R  Diagran, 
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C.   IMPLEMEHTATION  DAIA  HODELS 

Implementation  data  models  are  chosen  to  provide 
constructs  that  can  model  a  variety  of  user  problems.  Most 
commercial  database  management  systems  support  a  single  data 
model.  It  is  common  to  classify  these  models  into  three 
classes : 

•The  relational  model 

.Th€  hierarchical  model 

, Ihe  network  model 

,  Ihe  main  difference  between  the  three  classes  of  data 
models  lies  in  the  representation  of  the  relationships 
between  the  entities. 

1 .   Ihe  Relational  Data  Model 

In  a  relational  data  model,  the  entities  and  their 
relationships  are  represented  with  two-dimensional  tables. 
Every  table  represents  a  relation  and  is  made  up  of  rows  and 
columns.  Rows  of  such  tables  are  generally  referred  to  as 
tuples.  Likewise,  cclumns  are  usually  referred  to  as  attri- 
butes. Pigure  4.3  shews  three  relations,  one  for  officers, 
one  for  units,  and  tie  other  for  assignments. 
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Figure  4.3    Saaple  Data  in  Relational  Fora. 
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The  relational  data  model  approach  is  a  high-level 
data  retrieval  and  manipulation  tool  that  separates  the  user 
from  the  complexity  cf  storage  structures,  data  structures, 
and  access  paths.  Access  paths  do  not  have  to  be  predefined. 
The  lack  of  predefined  physical  a.ccess  paths  means  that 
relational  databases  must  be  exhaustively  searched  to 
satisfy  a  querry.  The  advantages  of  a  relational  data  model 
is  its  simplicity  and  its  well-developed  theoretical 
foundation. 

There  are  several  commercially  available  DBMS  pack- 
ages based  on  the  relational  model.  Some  of  them  are:  IBM's 
SQL/DS,  and  System  E,  Eelational  Software  Inc.'s  ORACLE, 
Relational  Technology  Inc. 's  INGRES,  Britton-Lee  Inc.'s 
IDM500,  Honeywell's  MSDS/LINUS,  Ashton-Tate's  dbase  II,  and 
National  Computer  Sharing  Services'  NOMAD.  The  relational 
data  model  will  be  discussed  in  greater  detail  in  Chapter 
71. 

2  •      Hierarchical    lata   Model 

The  hierarchical  data  model  is  made  up  of  a  hier- 
archy of  the  entity  types  involving  a  dominant  (root)  entity 
type  and  one  or  more  subordinate  (dependent)  entity  types  at 
the  lower  levels.  The  relationship  between  a  dominant  and  a 
subordinate  entity  type  is  one-to-many.  That  is,  for  a  given 
dominant  entity  there  may  be  many  subordinate  entity  types, 
and  for  a  given  dominant  entity  occurrence,  there  can  be 
many    occurrences   of    a  subordinate  entity    type. 

The  relative  simplicity  and  ease  of  use  the  hier- 
archical data  model  and  the  familiarity  of  data  processing 
users  with  a  hierarchy  are  major  advantages  of  a  hierarch- 
ical data  model.  Disadvantages  of  a  hierarchical  data  model 
are: 

-The   operations   of   insertion. and    deletion   are   complex. 

.Any      subordinate      node      is      accessible    only      through   its 
dciiinant  node, 
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Some  examples  of  commercially  available  DBMS  pack- 
ages tased  on  hierarchical  data  model  are  IBM's  IMS,  Intel's 
SYSTEM  2000,  and  Informatics  MARK  IV. 

3,   Network  Data  Model 

The  concept  of  the  network  data  model  is  based  on 
the  work  of  the  COIASIl  DBTG  (Conference  On  Data  Systems 
languages  Database  Task  Group) .  The  network  data  model 
employs  the  set  construct.  The  term  set  has  a  different 
meaning  than  its  mathematical  sense. 

The  network  lodel  of  a  system  is  diagrammatically 
represented  by  a  data  structure  diagram,  which  was  intro- 
duced by  C."^.  Bachman.  In  this  diagram  a  rectangle  enclosing 
a  name  denotes  an  entity  or  record  type.  Each  record  type  is 
composed  of  data  items;  but  the  particular  item  names  are 
not  shown  in  this  description,  although  they  are  defined  in 
the  complete  database  description  by  the  data  definition 
language.  In  a  data  structure  diagram,  a  directed  arrow 
connects  two  record  types.  The  record  type  located  at  the 
tail  of  the  arrow  is  called  the  owner-record  type,  and  the 
record  type  located  at  the  head  is  called  the  member- record 
type.  The  arrow  directed  from  owner  to  member  is  called  a 
set  type  and  it  is  given  a  name.  Thus  the  data  structure 
diagram  in  Figure  U,U  represents  the  set  type  ASSIGNET-TO. 
Here  UNIT  is  the  owner  record  type,  and  CFFICEH  is  the 
member  record  type. 

The  existence  of  a  set  type  specifies  that  there  are 
associations  between  records  of  heterogeneous  types  in  the 
database.  This  allows  the  designer  to  interrelate  diverse 
record  types  and  thus  to  model  associations  between  diverse 
entities  in  the  real  world. 

There  is  a  distinction  between  a  set  type  and  a  set 
occurrence  as  well  as  between  record  type  and  record  occur- 
rence. For  example,  SMITH  and  DAVIS  denote  two  record  occur- 
rences within  the  record  type  OFFICER. 
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OFFICER 

Figure    4.4        A    Betwork    Structure. 

lie  existance  of  a  set  type  is  declared  by  naming 
it,  stating  its  owner-record  type  (only  one)  and  its  meicber- 
record  type-  A  set  occurrence  is  one  occurrence  of  the 
owner-record  type  together  with  zero  or  more  occurrences  of 
each  member-record  type.  This  means  that  there  is  an  occur- 
rence of  a  set  type  whenever  there  is  an  occurrence  of  its 
owner-record  type.  A  set  occurrence  is  an  one-to-many  rela- 
tionship that  is  the  basic  building  block  for  relating 
diverse  records.  The  following  associations  exist  among  the 
owner   and   member   records   of    any   set    occurrence: 

.Given      an      owner   record,      it   is   possible      to    process   the 

related  member    records   of   that   set    occurrence. 
.Given      a   member      record,      it   is    possible      to    process   the 

related   owner   record    of    that   set    occurrence. 
.Given      a  member    record,      it   is  possible   to   process    ether 
member    records    in   the   same    set    occurrence. 

Any  implementation  that  conforms  to  these  three 
rules  is  a  valid  inplementation  of  the  concept  of  a  set 
type.  Two  occurrences  of  the  ASSIGNED-TO  set  type  are  shewn 
in    Figure   4.5 
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Figure  4.5        Occurences   of   Set   Type   ASSIGNED-TO. 

A  set  occurrence  with  no  member-record  occurrences 
is  called  an  "empty  set."  A  given  member  record  may  exist  in 
only  one  set  occurrence  of  a  given  type.  A  member  record 
cannot  simultaneously  belong  to  two  owner  records  for  the 
same    set   type. 

It  is  also  possible  to  implement  hierarchies  (one- 
to-many  relationships)  and  many-to-many  relationships  with 
set    structures    in   the  network    data  model. 

Ihe  major  advantage  of  the  network  model  is  that 
there  are  successful  database  management  systems  that  use 
the  network  data  model  as  the  basic  structure.  Ihe  main 
disadvantage  of   the    network    model   is   its   complexity. 

There  are  several  commercially  available  database 
management  system  packages  based  on  this  model.  Some  of  them 
are:  Burroughs*  DHS  II,  CDC's  DMS-170,  Cullinane's  IDMS, 
Cincom's  TOTAL,  Honeywell's  IDS/II,  Univac's  DMS1100, 
Digital    Equipment   Corforation* s   DBMS-10/20. 
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V-  SEMAHTIC  DAIi  MQDEL  (SDM) 

A.   IITBCDDCTION 

lie  semantic  database  model  (SDM)  was  introduced  by 
Hammer  and  McLeod  [Bef.  10: pp.  351-376],  and  can  serve  as  a 
conceptual  database  model  in  the  database  design  process. 
The  semantic  database  model  allows  the  same  information  to 
be  viewed  in  several  ways. 

Each  database  is  a  model  of  some  real  world  environment. 
The  contents  of  a  database  are  intended  to  reflect  a  snap- 
shot of  the  state  of  this  real  world  environment,  and  every 
change  to  the  database  should  reflect  an  event  occuring  in 
that  environment.  Therefore,  a  logical  database  represents 
selected  portions  of  reality.  Eventually,  we  may  ask  ques- 
tions like:  How  do  we  represent  the  real  world  environment? 
■Rhat  are  the  structures  of  the  real  world  environment?  Also, 
we  may  ask  the  questions  about  the  other  aspect  of  the 
problem,  such  as:  How  do  we  represent  the  conceptual  world? 
;jhat  are  the  structures  of  the  conceptual  world? 

^ •      Structures  of  Real  World  Environment 

The  first  structure  is  the  object.  The  real  world 
has  objects;  they  are  phenomena  that  can  be  represented  by 
nouns.  An  officer,  a  unit,  an  assignraent_reguest  are  all 
objects.  Objects  are  grouped  into  object  classes  by 
performing  generalization.  Objects  are  grouped  together  on 
the  basis  of  similarities.  OFFICER  is  an  example  of  an 
object  class. 

Objects  have  properties.  A  property  is  a  character- 
istic cf  an  object.  For  example,  an  officer's  name  and  rank 
are  properties.    Properties  are   inherent  in   objects.   The 
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collection  of  all  possible  values  of  a  property  is  called  a 
property  value  set,  Ihe  property  value  set  for  officer  rank 
is  the  collection  of  all  ranks  for  all  possible  officer 
objects. 

A  fact  is  an  assertion  that,  for  a  given  object,  a 
particular  property  has  a  particular  item  from  the  property 
value  set.  The  statement  that  the  rank  of  DAVIS  is  'captain* 
is  a  fact.  A  fact  is  the  intersection  of  a  given  object  with 
a  given  property  value  set. 

Objects  can  te  related  to  one  another.  These  rela- 
tions are  called  associations.  Associations  may  exist 
between  objects  of  the  same  class  or  of  different  classes. 
The  association  *  comnander*  exists  between  objects  of  the 
same  class  (OFFICER).  The  association  'assignment'  exists 
between  two  different  classes  (between  OFFICER  and  UNIT 
object  classes.)  Also,  an  object  may  have  an  association  to 
itself.  Associations  may  have  properties  just  as  objects 
have  froperties.  The  'assignment'  association  may  have  a 
property  such  as  Date__of_assigiiment . 

A  summary  of  real  world  structures  is  shown  in 
Figure  5.1. 

2 •   Structures  of  Conce£tu al  World 

Database  designers  should  define  a  conceptual  struc- 
ture for  each  of  the  real  world  structures. 

An  entity  is  a  conceptual  representation  of  an 
object.  Entities  may  be  grouped  into  entity  classes.  An 
entity  class  is  a  representation  of  an  object  class.  An 
entity  class  consists  of  all  the  entities  that  represent  the 
objects  of  an  object  class.  If  there  is  an  object  class 
called  OFFICER,  then  there  can  be  an  entity  class  called 
OFFICER. 
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structure         Definition  and  Examples 

Ohj€Ct  Phenomena  that  can  be  represented  by 

ncuns.  An  officer,  a  unit. 

Otiect  Classes   A  group  of  objects  formed  by  general- 
ization. OFFICEPw  ONIT. 

Properties      Characteristics  of  objects.  Name, 

Rank. 

Property  value   The  collection  of  all  possible 
set  values  of  a  given  property. 

All  ranks  for  Officers. 

Pact  Ihe  intersection  of  a  given  object 

¥ith  a  given  property  value  set. 
Bank  of  officer  DAVIS  is  captain. 

Association     A  connection  of  objects  of  the  same 

or  different  classes. 
EAVIS  is  assigned  to  unit  ALPHA. 


Figure  5- 1    Structures  in  the  Real  World. 

Entities  have  attributes  that  are  representations  of 
properties  of  objects.  Attributes  describe  and  characterize 
entities.  Rank, Name, Date_of_assignment  are  examples  of 
attributes. 

The  conceptual  structure  that  represents  property 
value  sets  is  called  a  domain.  A  domain  is  the  collection 
cf  all  values  that  an  attribute  can  have.  The  domain  of 
Names  is  a  collection  of  character  strings  of  seme  appro- 
priate length.  The  domain  of  height  (in  centimeters)  is  the 
integers  from  0  to  250. 

A  value  is  the  representation  of  a  fact.  The  value 
is  the  intersection  of  a  given  entity  irfith  a  given  domain.  A 
relationship  is  the  conceptual  representation  of  an  associa- 
tion. Relationships  may  exist  among  entities  in  the  same 
class  or  in  different  classes.  An  entity  may  have  a  rela- 
tionship to  itself.  A  relationship  may  have  attributes,  just 
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as  associations  may  have  properties.  The  assignment  relation 
may  have  a  property  such  as  Date_of  _assignment. 

Associations   between   real   wcrld   structures   and 
conceptual  structures  are  shown  in  Figure  5.2 


Eeal  World  Structure 

Conceptual  Structure 

Object 

Entity 

Object  Class 

Entity  Class 

Property 

Attribute 

Property  value  set 

Domain 

Fact 

Value 

Association 

Relationship 

Figure  5. 2    Real  World  and  Conceptual  Structures. 


B.   tSINERAl  PRINCIPIFS  OF  DESIGNING  SDM 

As  described  in  [Eef-  10:p.  355],  there  are  general 
principles  of  database  organization  to  support  the  design  of 
SDH.  These  are: 

"(1)~  A  database  is  to  be  viewed  as  a  collection  of 
entities  that  correspond  to  the  actual  objects  in  the 
application  environnent. 

(2)-  The  entities  in  a  database  are  organized  into 
classes  that  are  meaningful  collections  of  entities. 

(3)-  The  classes  of  a  database  are  not  in  general 
independent,  but  rather  are  logically  related  by  means 
of  interclass  connections. 

(U)-  Database  entities  and  classes  have  attributes 
that  describe  their  characteristics  and  relate  them  tc 
other  database  entities.  An  attribute  value  may  be 
derived  from  other  values  in  the  database. 
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(5)-  There  are  several  primitive  ways  of  defining 
interclass  connections  and  derived  attributes,  corre- 
sponding to  the  most  common  types  of  information  redun- 
dancy appearing  in  database  applications.  These 
facilities  integrate  multiple  ways  of  viewing  the  same 
basic  information,  and  provide  building  blocXs  for 
describing  complex  attributes  ana  interclass 
relationships. " 


€•   DEPISIHG  ENTITY  CLASSES 

The  basic  format   of  an  SDM  entity   class  description  is 
shown  in  Figure  5.3  £Bef.  6:p.  213]. 


ZNTITY_CLASS_>JA«i 
[ description:  — 


[interclass  connection:  ] 

member  attritutes: 

Attribute  name 

[ description:  ] 

value  class:    

mandatory ] 
■ multi valued][  no   overlap   in   values] 
'exhausts    value   class] 
"not   changeable] 
inverse:    Attribute_name] 
match:    Attribute   namel    of    ENTITY_CL flSS 
on   AttriBute_name2 ] 
[derivation:    ] 

[class   attributes: 
Attribute_name 

[ description:    ] 

value  class:    ^- 

[derivation:    ]    ] 

[ identifiers: 

Attribut  e_name1  +  [  A  ttribute_naiae2+r ...  ]  ]  ] 


Figure   5.3        Format  of   SDM    Entity   Class    Descriptions 
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An  SDH  database  is  a  collection  of  entities.  Entities  are 
organized  into  classes.  The  structure  and  organization  of  an 
SDM  database  is  specified  by  an  SDM  schema.  SDM  schema 
identifies  the  classes  in  the  database.  Appendix  A  is  an 
example  of  an  SDM  schema  for  'Personnel  Database.' 

Each  entity   class  in   an  SDH   schema  has  the  following 
features : 

1 .  A  class  name  identifies  the  class.  Each  class  name 
must  be  unique  with  respect  to  all  class  names  used 
in  a  schema.  OFFICER,  UNIT,  ASSIGNMENT_REQOEST  are 
all  class  names. 

2.  The  class  has  a  collection  of  members  (the  entities)  . 
Each  class  in  an  SDM  schema  is  a  homogeneous  collec- 
tion of  one  type  of  entity. 

3.  A  textual  class  description  is  an  optional  feature  of 
entity  class.  It  describes  the  meaning  and  contents 
of  the  class. 

4.  The  class  has  several  attributes  which  describe  the 
members  of  that  class  or  the  class  as  a  whole.  Ihere 
are  two  types  cf  attributes:  Member  attributes  and 
Class  attributes.  For  example,  each  member  of  class 
UNIT  has  attributes  Name,  Dnit_category ,  location 
which  identify  the  unit's  name,  its  category,  and  its 
location,  respectively.  A  class  attribute  describes  a 
property  of  a  class  taken  as  a  whole.  For  example, 
the  class  ASSIGNMENT_REQOEST  has  the  attribute 
Numier__of_reguests,  which  identifies  the  number  of 
requests  issued  in  the  current  year. 

5.  An  SDM  class  can  either  be  base  or  nonbase.  A  base 
class  is  one  that  is  defined  independently  of  all 
other  classes  in  the  database.  In  App€ndix_A  the 
class  OFFICER  is  a  base  class.  It  exists  indepen- 
dently of   other  classes.   If   we  think  of   an  entity 
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class  such  as  COMMANDERS,  it  is  a  subset  of  the 
OFFICER  class,  and  so  can  be  derived  from  this  class. 
The  class,  COMMANDERS  is  called  as  an  nonbase  class, 
and  it  does  not  have  independent  existance.  Every 
nonbase  class  has  an  entry,  interclass  coLnecticn, 
that  describes  how  the  class  is  to  be  constructed. 
6.  If  the  class  is  a  base  class,  it  has  identifiers. 
These  are  attributes  that  unic^uely  identify  members. 
For  example,  class  OFFICER  has  the  unique  identifier, 
Military^ID- 

D.   DEFIBIHG  ATTRIBUTES 

There  is  a  collection  of  attributes  in  each  class 
description.  These  attributes  represent  the  properties  of 
objects.  Each  attribute  has  the  following  features. 

1.  An  attribute  name  identifies  the  attribute.  Attribute 
names  must  be  unique  within  the  class  where  they  are 
defined.  They  must  be  unique  within  all  classes  that 
are  derived  from  their  class  of  definition. 
Date_of_pronioticn,  Main_branch  are  examples  of  attri- 
bute names. 

2.  The  attribute  has  a  value  which  is  either  an  entity 
in  the  database  (a  member  of  some  class)  or  a  collec- 
tion of  such  entities.  The  value  of  an  attribute  is 
selected  from  its  underlying  value  class.  Value 
class  is  another  term  for  domain  that  contains  the 
permissible  values  of  the  attribute.  The  value  class 
of  an  attribute  may  be  any  class  in  the  schema  or  may 
be  the  special  value  NUIl.  (i.e.,  no  value.)  DAIE, 
EEANCHES  are  examples  of  value  classes. 

3.  The  applicability  of  the  attribute  is  specified  by 
indicating  that  the  attribute  is  either: 
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(a)  a  member  attribute,  which  applies  to  each  member 
of  the  class,  and  so  has  a  value  for  each  member 
(e.g.,    Militarj_ID   of   OFFICER),    or 

(b)  A  class  attribute,  which  applies  to  a  class  as  a 
whole,  and  has  only  one  value  for  the  class  (e.g., 
Number_of_regu€Sts    of    ASSIGNI1ENT_REQUEST. ) 

4.  A  textual  attribute  description  is  an  optional 
feature  that  describes  tha  meaning  and  purpose  of  the 
attribute.  This  serves  as  an  integrated  form  of 
database   documentation. 

5.  The  attribute  is  specified  as  either  single  valued  or 
multivalued.  A  single-valued  attribute  has  one 
value,  that  is,  a  member  of  the  value  class  of  the 
attribute.  The  value  of  a  multivalued  attribute  is  a 
subclass  of  the  value  class  (e.g., 
Foreign_language_capability  of  OFFICER  is  a  multiva- 
lued attribute.)  The  default  value  for  this  feature 
is  single   valued. 

6.  An  attribute  can  be  specified  as  mandatory,  which 
means  that  a  rull  value  is  not  allowed  for  it.  For 
example,  attribute  Military_ID  of  OFFICER  is  speci- 
fied as  "mandatory";  this  models  the  fact  that  every 
OFFICER    has    a    Military_ID. 

7.  An  attribute  can  be  specified  as  not  changeable, 
which  means  that  once  set  to  a  nonnull  value,  this 
value  cannot  be  altered  except  to  correct  an  error. 
For  example,  attribute  Military_ID  of  OFFICER  is 
specified   as    "not  changeable." 

8.  A  member  attribute  can  be  required  to  be  exhaustive 
of  its  value  class.  This  means  that  every  member  of 
the    value  class  of   the    attribute   is    used. 

9.  Finally,  multivalued  attributes  can  be  specified  as 
non over  lapping.  This  means  that  a  member  of  the 
value    class   can  be   used    at    most    once. 


61 


E.   BEMBEB  JkTTEIBOTE  IMTEBHELATIONSHIPS 

The  semantic  data  model  provides  three  facilities  for 
defining  interrelationships  among  member  attributes.  Ihese 
facilities  are  inversion,  matching,  and  derivation. 

1 .  Inversion 

The  inverse  facility  causes  two  entities  to  be 
contained  within  each  other.  Member  attribute  Al  of  class  CI 
can  be  specified  as  the  inverse  of  member  attribute  A2  cf  C2 
which  means  that  the  value  of  Al  for  a  member  Ml  of  CI 
consists  of   those    members   of   C2    whose   value   of   A2   is    mi. 

Inverses  are  always  specified  by  a  pair  of  attri- 
butes which  establishes  a  binary  association  between  the 
members  cf  the  classes.  For  example,  in  Appendix_A  the 
entity  classes  OFFICEE  and  UNIT  are  inverses  of  each  other. 
In  OFFICER,  the  attribute  Unit_assigned  has  the  value  class 
UNIT,  and  the  inverse  attribute  Off icer_assigned.  In  UNIT, 
the  attribute  Of f icer_assigned  has  the  value  class  OFFICES, 
and    the   inverse   attribute   Unit_assigned. 

2 .  Watching 

The  second  SEM  facility  for  representing  relation- 
ships is  matching.  i<ith  matching,  a  member  of  one  entity 
class  is  matched  with  a  member  of  another  entity  class.  The 
value  of  the  match  attribute  Al  for  the  member  M 1  of  class 
CI  is  determined  as  fellows. 

1.   A  member  M2  of  some  class  C2   is  found  that  has  Ml  as 

its  value  of  irember  attribute  A2. 

2-   The  value   of  member  attribute  A3   for  112  is   used  as 

the  value  of  Al  for  Ml. 

For  a   multivalued  attribute   (call  it   Al)  ,   it  is 

permissible  for   each  member   of   CI   to  match   to   several 

members  of  C2;   in  this  case,  the  collection  of  A3  values  is 


62 


the  value  of  attribute  A1.  In  other  words,  the  value  cf  an 
attriiute  in  one  of  the  members  is  moved  to  the  other. 

Inversion  and  matching  provide  multiple  ways  of 
viewing  n-ary  associations  among  entities.  Matching  supports 
binary  and  higher  degree  associations,  while  inversion 
allows  the  specification  of  binary  associations. 

lor  example,  a  matching  specification  in  Appendix  A 
indicates  that  the  value  of  the  attribute 
Foreign_language_capability  of  a  member  0  of  class  OFIICEE 
is  egual  to  the  value  of  attribute  Foreign_language  of  the 
member  F  of  class  FOREIGN_LANGDAGE  whose  FID  value  is  0. 

3 .   Derivation 

SDM  provides  the  ability  to  define  an  attribute 
whose  value  is  calculated  from  other  information  in  the 
database.  Such  an  attribute  is  named  derived. 

Ihe  approach  is  to  provide  a  small  vocabulary  of 
high-level  attribute  derivation  primitives  that  directly 
model  the  most  common  types  of  derived  information.  Each  of 
these  primitives  provides  a  way  of  specifying  one  method  of 
computing  a  derived  attribute. 

F.   CIASS  ATTRIBUTE  IKTEEBELATIONSHIPS 

Attribute  derivation  primitives  for  member  attritutes 
can  be  used  to  define  derived  class  attributes,  as  these 
primitives  derive  attribute  values  from  those  cf  ether 
attributes.  Of  course,  instead  of  deriving  the  value  of  a 
member  attribute  from  the  value  of  other  member  attributes, 
the  class  attribute  primitives  will  derive  the  value  of  a 
class  attribute  from  the  value  of  other  class  attributes. 
Moreover,  there  are  two  other  primitives  that  can  be  used  in 
the  definition  of  derived  class  attributes: 
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1.  An  attribute  can  be  defined  so  that  its  value  equals 
the  number  of  members  in  the  class  it  modifies.  Fcr 
example,  attribute  Number_of _reguests  is  derived  from 
A£SIGNHENT_REQDEST  record  by  summation  of  members  as 
specified. 

2.  An  attribute  can  be  defined  whose  value  is  a  function 
of  a  numeric  member  attribute  of  a  class;  the  func- 
tions available  are  "maximum",  "minimum",  "average", 
and  "sum"  taken  over  a  member  attribute.  The  compu- 
tation of  the  function  is  made  over  the  members  of 
the  class. 
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VI.  JEIATIOHAL  DATABASE  MODEL 

A.   EEIAIICHAL  DATA  SIBOCTUBE 

In  order  to  explain  the  relational  data  structure,  it 
will  be  very  helpful  to  use  the  sample  data  in  relational 
form.  Figure  6.1  reflects  a  relational  view  of  the  data 
which  is  organized  into  three  tables:  OFFICER  (officers  who 
are  in  the  army) ,  CCDR3ES  (all  courses  which  are  offered) , 
and  CCDH£E_ATTENDED  (officers  who  took  some  courses).  The 
OFFICER  table  contains,  for  each  officer,  a  military  identi- 
fication number,  rank,  name,  and  the  city  where  the  officer 
was  born;  the  COURSES  table  contains,  for  each  course,  a 
course  code,  course  name,  brief  description  of  that  course, 
duration,  and  location  where  the  course  is  offered;  and  the 
COaRSE_ATTENDED  table  contains,  for  each  grade,  a  military 
identification  number,  a  course  code,  and  a  grade  taken.  The 
following  assumptions  regarding  officers,  courses,  and 
course  attended  are  made.  Each  officer  has  a  unique  military 
ID  number,  exactly  cne  rank,  name,  and  city  name.  Each 
course  has  a  unigue  course  code,  exactly  one  course  name, 
description  of  the  course,  duration,  and  location.  At  any 
given  time,  no  more  than  one  grade  exists  for  a  given 
officer/course  combination, 

1 .   Definition  of  a  Relation 

Assume  that  we  are  given  a  collection  of  sets  El, 
Z2,  ...  , En  (they  are  not  necessarily  distinct) ,  R  is  a 
relation  on  those  n  sets  if  it   is  a  set  of  ordered  n-tuples 

<e1,  e2,  ,  en>  such  that  el  belongs  to  El,  e2  belongs  to 

E2,  ...   ,  en  belongs  to  En.   Sets  El,  E2,  ...   ,  En  are  the 
domains   of  R.    The  value   n  is   the   degree  of   R.   It   is 
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Figure   6.1         Saaple    Data   in    Relational  Fora. 

sometimes   called      arity   n  [fief.    3:pp-        83-93],      [Bef.    S:pp, 
14-25  1. 

From  the  mathematical  set-theory  perspective,  we  can 
give  another  equivalent  definition  of  a  relation  that  is 
sometimes  useful-  A  relation  is  any  subset  of  the  Cartesian 
product  of  one  or  more  domains.  For  example,  if  we  have  n 
sets,  say  n=2,  E1={a,t},  and  E2={0,1,2},  then  El  x  E2  is  the 
Cartesian  product  of  these  n  sets.  That  is,  it  is  the  set  of 
all  possible  ordered  n-tuples  <e1,e2>  such  that  el  belongs 
to    El,        e2   belongs      to   E2.         The      result    of      this   Cartesian 
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product  cf  E1  X  E2  is  {  (a,0) , (a, 1) , (a,2)  , (b,0) ,  (b, 1 ) ^  (b,2)} . 
Figure  6.2  ,  for  example,  shows  the  Cartesian  product  of  two 
sets  MID  and  CCODE  (Military  ID  No.,  and  Course  Code). 
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"id" 

CCODE 
~~C1~ 

ID1 

C2 

ID1  , 

C3 

ID2 

CI 

ID2 

C2 

ID2 

C3 

Figure  6.2    Aji  Example  of  a  Cartesian  Product. 

A  relation  called  COOBSES,  of  degree  5  is  illus- 
trated in  Figure  6.1  (b) .  The  five  domains  are  sets  of 
values  representing,  respectively,  course  codes  (CCCDE) , 
course  titles  (TITLE) ,  brief  description  of  each  course 
(CDESCRIFT)  ,  duration  for  each  course,  and  locations  vrhere 
courses  are  offered.  The  "course  title"  domain,  for  example, 
is  the  set  of  all  valid  course  titles;  note  that  there  may 
be  some  titles  included  in  this  domain  that  do  not  actually 
appear  in  the  COURSES  relation  at  this  particular  moment. 

It  is  convenient  to  view  a  relation  as  a  table, 
where  each  row  is  a  tuple  and  each  column  corresponds  to  one 
component.  The  columns  are  often  given  names,  called  attri- 
butes. The  number  cf  tuples  in  a  relation  is  called  the 
cardinality  of  that  relation;  e.g.,  the  cardinality  cf  the 
COURSES  relation  is  four,  and  it  has  five  attributes  (or 
columns).  As  mentioned  earlier,  a  domain  can  be  thought  as  a 
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pool  cf  values  from  which  the  actual  values  for  a  given 
attribute  are  drawn.  It  is  very  important  to  note  that  the 
domains  of  a  relation  do  have  an  ordering  defined  among 
them.  If  we  have  a  tuple  {a1,a2,  ...  ^an)  with  n  components, 
the  value  of  the  j  th  component  in  this  n-tuple  has  to  be 
drawn  from  the  j  th  domain.  In  Figure  6.1  {h) ,  (C2,  Cobol, 
Cobol  Prog.  Language,  8,  i3onterey)  is  the  second  tuple  of 
the  CCDESES  relation,  and  the  value  of  the  fourth  ccmpcnent 
of  this  tuple  under  the  attribute  named  DURATION  is  drawn 
from  the  fourth  domain  which  is  a  set  of  positive  integers, 
ranging  from  0  to  99  9.  Mathematically  speaking,  the  rear- 
rangement of  the  five  columns  of  the  COURSES  relation  into 
some  different  order  results  a  different  relation. 

It  is  important  to  note  the  difference  between  a 
domain  and  attributes  which  are  drawn  from  that  domain.  An 
attribute  represents  the  use  of  a  domain  within  a  relation. 
Figure  6.3  shows  a  part  of  a  relational  schema  in  which  four 
domains  (MILITARY_ID,  MILITARY^RANK,  OFFICER_N AME ,  and 
lOCAIICN)  and  one  relation  (OFFICER)  have  been  defined  by 
using  a  data  definition  language  [ Eef -  3:pp.  83-93].  '  The 
relation  is  declared  with  four  attributes  (MID^  RANK,  NAME, 
and  CITY),  and  each  attribute  is  specified  as  being  drawn 
from  a  corresponding  domain.  It  is  sometimes  possible  that 
the  domains  of  more  than  one  attribute  can  be  the  same.  In 
ether  words,  those  attributes  can  use  the  same  domain  in 
common.  To  differentiate  between  attributes  that  have  the 
same  dcaain,  each  is  given  a  unigue  attribute  name.  A 
crucial  feature  of  relational  data  structure  is  that  associ- 
ations between  tuples  are  represented  solely  by  data  values 
in  attributes  (columns)  drawn  from  a  common  domain. 

All  relations  in  a  relational  database  are  required 
to  satisfy  the  following  condition. 

"Every  value  in  the  relation  (i.e.,  each  attribute  value 
in  each  tuple)  is  atomic  (i.e.-  nondecomposable  so  far 
as  the  system  is  concerned)."   [Ref.  3:p.  86] 
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That  is,  at  every  row-and-column  position  in  the  tatle  there 
always  exists  precisely  one  value,  never  a  set  of  values. 
But  in  the  case  of  having  "unknown"  or  "inapplicable" 
values,  null  values  can  be  allowed  to  represent  these 
special  values  in  a  relation.  This  is  the  idea  of  normaliza- 
tion. If  a  relation  satisfies  the  above  condition,  it  is 
said  to  be  normalized.  This  idea  will  be  discussed  in  detail 
later. 
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Figure   6-3        Domains   and   Attributes- 

The  generalized  format  or  notation  which  is  us€d  to 
represent  a  relation  is  called  the  relation  structure.  For 
example,  OFFICER  (Mid,  Bank,  Name,  City)  is  the  structure  of 
the  OFFICER  relation.  In  geoeral,  Relation_name  (attri- 
bute 1  ,  attribute2,  ...  ,attributeN)  is  the  general  foraat  to 
show  the  structure  of  a  relation.  If  we  add  constraints  on 
allowable  data  values  to  the  relation  structure,  we  then 
have    a    relational    schema  [Bef.     11]. 
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2.   Ke^s 

It  is  frequently  the  case  that  within  a  given  rela- 
tion there  is  one  attribute  with  values  that  are  unique 
within  the  relation  and  thus  can  be  used  to  identify  the 
tuples  of  that  relation-  Attribute  MID  of  the  OFFICER  rela- 
tion ,  for  example,  has  this  property.  Each  OFFICER  tuple 
contains  a  distinct  MID  value,  and  this  value  may  be  used  to 
distinguish  that  tuple  from  all  others  in  the  relation.  MID 
is  called  the  primary  key  for  OFFICER. 

A  single  attribute  may  not  always  be  the  primary  key 
in' a  relation.  However,  the  values  of  more  than  one  attri- 
bute together  may  constitute  a  unique  identifier.  Thus,  seme 
combination  of  attributes,  when  taken  together,  have  the 
unique  identification  property.  In  the  relation 
COaRSE_AITENDED  (Fig.  6.1  ),  for  example,  the  combination 
(MID,  CCODE)  has  this  property-  The  existence  of  such  a 
combination  is  guaranteed  by  the  fact  that  a  relation  is  a 
set.  Since  sets  do  not  contain  duplicate  elements,  each 
tuple  of  a  given  relation  is  unique  with  respect  to  that 
relation,  and  hence  at  least  the  combination  of  all  attri- 
butes has  the  unique  identification  property.  In  the  above 
example,  the  combination  (MID,  CCODE)  is  said  to  be  a 
composite  key  as  well  as  a  primary  key  for  the  OFFICER 
relation. 

On  the  other  hand,  occasionally  we  may  encounter  a 
relation  in  which  there  is  more  than  one  attribute  combina- 
tion having  the  unique  identification  property  and  hence 
more  than  one  candidate  key.  In  such  a  case  we  may  arbi- 
trarily choose  one  of  the  candidates  as  the  primary  key  for 
the  relation.  If  a  candidate  key  is  not  the  primary  key,  it 
is  called  an  alternate  key  [Bef-  3:pp-  83-93]-  The  COURSES 
relation  in  Fig.  6-1  (b)  is  such  a  relation.  Each  course 
has  a  unique   course  code  and  a  unique   course  name  (TITLE) . 
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If  th€  designer  chooses  one  of  these  candidate  keys,  say 
CCODE,  as  the  primary  key  for  the  relation,  TITLE  will  he  an 
alternate  key. 

The  primary  key  is  a  unique  identifier  for  tuples  in 
a  relation.  Those  tuples  represent  entities  in  the  real 
world,  and  the  primary  key  really  serves  as  a  unique  identi- 
fier for  those  entities.  For  example,  the  tuples  in  the 
CFFICIE  relation  represent  individual  officers,  and  values 
of  the  MID  attribute  actually  identify  those  officers,  not 
just  the  tuples  that  represent  them.  As  a  result  of  this 
interpretation,  we  can  now  introduce  the  following  rules 

".Integrity  Rule  1  (Entity  integrity) 

No   component   of  a  primary   key   value  may   be   null." 

[Eef.  3:p.  89] 

.  According  to  the  definition,  all  entities  must  have 
a  unique  identification  of  some  kind.  That  is,  they  must  be 
distinguishable  from  each  other.  Primary  keys  perform  the 
unique  identification  function  in  a  relational  database.  If 
a  prinary  key  value  is  null  in  a  relation,  this  implies  that 
there  is  some  entity  that  does  not  have  a  unique  identifica- 
tion. In  ether  words,  it  is  not  distinguishable  from  ether 
entities.  It  is  strongly  recommended  that  both  wholly  and 
partially  null  identifiers  be  prohibited. 

Those  types  of  arguments  lead  us  to  a  second  integ- 
rity rule.  Occasionally  one  relation  includes  references  to 
another.  Relation  CODRSE_ATTENDED,  for  example,  includes 
references  to  both  the  OFFICER  relation  and  the  COURSES 
relation,  via  its  MIL  and  CCCDE  attributes.  It  is  clearly 
seen  that  if  an  occurrence  or  a  tuple  of  COURSE_ATTENDED 
contains  a  value  for  MID,  say  ID2,  then  a  tuple  for  officer 
ID2  should  exist  in  OFFICER.  Otherwise,  the  COURSE_ATTENDED 
tuple  would  refer  to  an  nonexistent  officer;  and  similarly 
for  courses.  To  make  these  notions  clear,  we  should  under- 
stand the  notion  of  primary  domain. 
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"A  given  domain  may  optionally  be  designated  as  priicary 
if  and  only  if  there  exists  some  single -attribute 
primary  key  defined  on  that  domain."   [Hef.  3: p.  89] 

For  example,  we  may  designate  the  domain  MILITARY_ID  as 
primary,  by  extending  its  definition  shown  in  Fig.  6.3  as 
follows: 

■  DOMAIN  MILITiEY_ID  CHABACTEB(9)  PRIMARY 

Any  relation  which  contains  an  attribute  that  is 
defined  on  a  primary  domain  (for  example,  relation 
CODRSE^AITENDED)  must  obey  the  following  rule. 

".Integrity  Rule  2  (Referential  integrity) 
Let  D  be  a  primary  domain,  and  let  Ri  be  a  relation  with 
an  attribute  A  that  is  defined  on  D.  Then,  at  any  given 
time,  each  value  of  A  in  RI  must  be  either  (a)  null,  or 
(b)  equal  to  V,  say,  where  V  is  the  primary  key  value  of 
some  tuple  in  some  relation  R2  (Rl  and  r2  not  neces- 
sarily distinct)  with  primary  key  defined  on  D. " 
[Hef-  3:p.  89] 

Here,  relation  R2  must  exist  because  of  the  defini- 
tion cf  primary  domain,  and  if  attribute  A  is  the  primary 
key  of  Rl,  the  rule  is  trivially  satisfied.  When  an  attri- 
bute such  as  A  in  one  relation  is  a  key  of  another  relation, 
the  attribute  is  called  a  foreign  key.  For  example,  attri- 
bute CCODE  of  relation  COURSE^AITENDED  is  a  foreign  key, 
because  its  values  are  values  of  the  primary  key  of  the 
COURSES  relation. 

3  .   Ex tent ions  and  Intentions 

An  extention  and  an  intention  are  actually  compo- 
nents of  a  relation  in  a  relational  database. 

Ihe  set  of  tuples  existing  in  a  given  relation  at 
any  given  instant  is  known  as  the  extention  of  that  rela- 
tion. Thus  the  extention  changes  with  time.  That  is,  it 
varies  depending  upon  the  several  operations  performed  on 
tuples  which  are  added,  deleted,  and  updated. 
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The  intention  of  a  given  relation  is  the  permanent 
part  of  the  relation.  It  is  independent  of  time.  The  inten- 
tion corresponds  to  what  is  declared  in  the  relational 
schema.  Hence,  the  intention  is  the  combination  of  the  rela- 
tion structure  (soiretimes  called  the  naming  structure) 
mentioned  earlier  and  the  integrity  constraints  which  can  be 
subdivided  into  key  constraints,  referential  constraints, 
and  other  constraints.  Key  constraints  are  constraints 
implied  by  the  existence  of  candidate  keys.  The  primary  key 
specification  and  the  alternate  keys  specifications  included 
by-  the  intension  imply  a  uniqueness  constraint  (by  the  defi- 
nition of  candidate  key)  and  a  no-nulls  constraint  (by 
Integrity  Rule  1)  respectively.  Referential  constraints  are 
constraints  implied  by  the  existence  of  foreign  keys.  A 
specification  of  all  foreign  keys  in  the  relation  implies  a 
referential  constraint  (by  Integrity  Rule  2) .  The  relations 
in  Figure  6.1  are  examples  of  extentions  and  they  also  show 
the  intentional  relation  (or  naming)  structure  which 
consists  of  the  relation  name  plus  the  names  of  the  attri- 
butes. The  operational  data  appearing  under  those  attributes 
are  the  extention  part  of  those  tables. 

B.   BEIATIONAL  ALGEBBl 

Relational  algebra  is  a  collection  of  operations  on 
relations.  Each  operation  takes  one  or  more  relations  as  its 
operand  (s)  and  produces  another  relation  as  its  result.  A  :  = 
E  +  C;,  for  example,  is  an  arithmetic  expression  in  PASCAL 
Programning  Language.  B  and  C  are  operands  known  as  vari- 
ables for  the  addition  operator  (+) .  After  performing  this 
operation,  the  result  will  be  assigned  into  the  variable  A. 
Likewise  we  encounter  B  and  C  as  two  relations  and  plus  sign 
(+)  as  union  operator  in  the  relational  algebra.  After  this 
operation  is  performed,  A  will  be  a  new  relation  produced  by 
that  operation  as  its  output  or  result. 
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The  relational  alcefcra  basically  consists  of  two  groups 
of  operators:  tlie  set  operators  union,  difference,  intersec- 
tion, and  product;  and  the  special  relational  operators 
selection,  projection,  join,  and  division.  These  operations 
are  very  important  in  order  to  understand  the  other  high- 
level  relational  languages  such  as  SQL,  QBE  which  will  be 
discussed  later  in  this  Chapter, 

1 .   Set  Operators 

The  traditiocal  set  operators  are  union,  difference, 
intersection,  and  Cartesian  product.  The  two  relations  used 
as  operands  must  be  union-compatible  for  all  except 
Cartesian  product.  This  means  that  each  relation  must  have 
the  same  number  of  attributes  (same  degree) ,  and  the  attri- 
butes in  corresponding  columns  must  come  from  the  same 
domain  (the  names  of  the  attributes  need  not  be  the  same) . 
[Hef.  3:pp.  203-215],  [Ref.  6:pp.  242-282] 
-Union 

The  union  of  two  relations  is  formed  by  combining 
the  tuples  from  one  relation  with  those  of  a  second  relation 
to  produce  a  third.  In  other  words,  the  union  of  two  rela- 
tions A  and  B,  A  QNION  B,  is  the  set  of  all  tuples  t 
belonging  to  either  A  or  B  (or  both).  Duplicate  tuples  are 
eliminated.  For  example,  let  A  be  the  set  of  officer  tuples 
for  officers  stationed  in  Monterey,  and  B  the  set  of  officer 
tuples  for  officers  who  took  course  C2.  Then  A  UNION  B  is 
the  set  of  officer  tuples  for  officers  who  either  are 
stationed  in  Monterey  or  took  course  C2  (or  both) . 
-Difference 

The  difference  of  two  relations  is  a  third  relation 
containing  tuples  which  occur  in  the  first  relation  but  not 
in  the  second.  That  is,  the  difference  between  two  (union- 
compatible)  relations  A  and  B,  A  MINUS  B,  is  the  set  of  all 
tuples  t  belonging  to  A  and  not  to  B.  For  example,  let  A  and 
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B  be  the  same  sets  as  in  the  example  under  "Union".  Then  A 
MINDS  B  is  the  set  cf  officer  tuples  for  officers  whc  are 
stationed  in  Monterey  and  who  did  not  take  course  C2. 

-Intersection 

Ihe  intersection  of  two  relations  is  a  third  rela- 
tion containing  common  tuples.  Again,  the  relations  must  be 
union-ccmpatible.  Mathematically  speaking,  the  intersection 
of  two  relations  A  and  B,  A  INTERSECT  B,  is  the  set  of  all 
tuples  t  belonging  to  both  A  and  B.  Let  A  and  B  again,  for 
example,  be  as  in  the  example  under  "Onion"  above.  Then  A 
INTERSECT  E  is  the  S€t  of  officer  tuples  for  officers  who 
are  stationed  in  Monterey  and  took  course  C2. 

-Cartesian  product 

The  Cartesian  product  of  two  relations  is  the 
concatenation  of  every  tuple  of  one  relation  with  every 
tuple  of  a  second  relation.  let  A  and  B  be  two  relations. 
Then  A  TIMES  B  or  A  x  B  is  the  set  of  all  tuples  t  such  that 
t  is  the  concatenation  of  a  tuple  "a"  belonging  to  A  and  a 
tuple  "t"  belonging  to  B.  The  concatenation  of  a  tuple  a  = 
(a1,...,aM)  and  a  tuple  b  =  (b1,...,bN),  in  that  order,  is 
the  tuple  t  =  (a1,  . .  .  ,ari,bM+ 1 , . . .  ,bi!l+N)  .  For  example,  let  A 
be  the  set  of  all  officers'  military  identification  numters, 
and  B  the  set  of  all  course  code  numbers.  Then  A  TIMES  3  is 
the  set  of  all  possible  military_ID_number/course_code 
pairs. 

2 •   Special  P.elaticnal  Operations 

-Projection 

Projection  is  an  operation  that  selects  specified 
attributes  from  a  given  relation.  The  result  of  the 
projection  is  a  new  relation  having  the  selected  attributes. 
In  other  words,  the  projection  operator  creates  a  "vertical" 
subset  of  a  given  relation  obtained  by  selecting  specified 
attributes,   in   a  specified  left-to-right  order,    and  then 
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eliminating  duplicate  tuples  within  the  attributes  selected. 
Projection  can  also  te  used  to  change  the  order  of  attri- 
butes in  a  relation-  For  example,  consider  the 
COOESE_ATTENDED  relation  in  Figure  6.1  (c) .  The  projection 
of  COUESZ_ATTENDED  cn  Ccode  and  Grade  attributes,  denoted 
with  brackets  as  CODESE_ATTENDED  {Ccode,  Grade} ,  is  shown  in 
Figure  6.4  Note  that  although  COORSE_ATTENDED  has  eight 
tuples  to  begin  with,  the  projection  COURSE_ATTENDEL  {Ccode, 
Grade}  has  only  six.  Two  tuples  were  eliminated  because  the 
tuple  {CI,  A-}  and  {C3,  A-}  occurred  twice  (after  the 
projection  was  done) .  Another  example  of  reordering  the 
attributes  within  the  OFFICER  relation  is  to  write  a  state- 
ment   for   projection    such   as    OFFICER     {City,    Name,    Rank,    Mid}. 


CCODE 

GRADE 

c^ 

B  + 

C2 

A 

C3 

A- 

CI 

A- 

C2 

B 

C4 

C  + 

Figure   6.4        Projection   of   COORSE_ATTEliDED  Relation. 

-Selection 

Ihe  selection  operator  yields  a  "horizontal"  subset 
(rows)  of  a  giver  relation.   In  other  words,  selection  iden- 
tifies tuples  to  be  included  in  the  new  relation.   Selection 
is  denoted  by  specifying  the  relation  name,   followed  by  the 
keyword  T^HERE,  followed  by  a  conditional  statement  involving 
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attritut€s.  The  condition  is  a  single  or  a  combination  of 
Boolean  expression (s) ,  Figure  6.5  (a)  shows  the  selection  of 
the  relation  COURSES  WHERE  LOCATION  =  'MONTEREY'.  Figure  6,5 
(b)  shows  the  selection  of  COURSES  WHERE  DURATION  >  12- 
Figure  6.5  (c)  shows  the  selection  of  COURSES  WHERE  DURATION 
>  12  AND  LOCATION  =  'INDIANAPOLIS'. 
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Figure  6.5    Selection  of  COURSES  Relation. 


-Join 

The  join  operation  is  a  combination  of  the  product, 
selection,  and  (possibly)  projection  operations.  The  join  of 
two  relations,  say  A  and  B,  is  denoted  as  A  JOIN  B  which  is 
equivalent  to  taking  the  Cartesian  product  of  A  and  E  and 
then  performing   a  suitable  selection   on  that   product.   If 
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necessary,  duplicate  attributes  can  be  eliminated  by 
projection.  The  Join  operation  is  a  binary  operation  since 
it  operates  on  t¥o  relations  but  selection  and  projection 
are  operations  on  single  relations  (i.e.,  they  are  unary 
operations) . 

Actually  there  are  many  possible  join  operations  in 
which  the  "joining  condition"  is  based  on  equality  or 
inequality  between  values  in  the  common  column  of  two  rela- 
tions. Those  operations  are  usually  called  eguijoin, 
greater-than  join,  less-than  join,  and  natural  join.  A 
natural  join  is  an  eguijoin  with  the  elimination  of  dupli- 
cate columns,  and  is  a  common  relational  operation.  Por 
example,  the  eguijoin  and  the  greater-than  join  can  produce 
the  same  result  as  the  expressions 

(•A  TIMES  B)  BEERE  A.  X  =  B.Y 
(A  TIMES  B)  TSHERE  A.  X  >  B.I 

where  A  and  3  are  relations,  and  X  and  Y  are  attributes 
belong  to  A  and  B,  respectively.  The  values  of  attributes  X 
and  Y  must  be  derived  from  some  common  domain.  Consider  the 
OFFICES  and  COURSES  relations  shown  in  Figure  6.1  (a)  and 
(b)  .  Tables  OFFICER  and  COURSES  may  be  joined'  over  their 
CITY  and  LOCATION  attributes;  the  result  is  shown  in  Figure 
6.6  We  denote  such  a  join  as 

{OFFICER    JOIN    COURSES)    WHERE   OFFICER. CITY   = 
COURSES. LOCATION. 

The  join   in  Figure   6.6  is  an   eguijoin.   If  the  duplicate 
attributes  (CITY  and  LOCATION)  were  eliminated,  then  the  new 
relation  would   be  created  as  a   result  of  the   natural  join 
operation. 
-Division 

Ihe  division  operation  has   a  binary  relation  R  (X,Y) 
as  the  dividend  and  a  divisor  that  includes  Y.  The  result  is 
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Figure  6.6    Join  of  CFFICEfi  and  COOESES  over  CITY  and  IOC. 

a  set,  S,  of  values  of  X  such  that  x  belongs  to  S  if  there 
is  a  tuple  (x,y)  in  R  for  each  y  value  in  the  divisor. 
[Eef.  1:pp-  15-48] 
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Figure  6-7  The  DIVISION  Operation- 
Figure  6.7  illustrates  this  operation.  If  relation 
COURSE  is  the  dividend  and  relation  COURSE_LOCATION  is  the 
divisor,  then  CODES  =  COURSE/COURSE_LOCATION.  In  the  Figure, 
C2  is  the  only  course  code  for  which  there  is  a  tuple  with 
Monterey  and  Berkeley  (i.e.,  <C2,  Monterey>  and  <C2, 
Berkeley>)  in  COURSE  relation.  The  other  course  codes,  Cl , 
C3,    and    C4    do    not    satisfy    this   condition. 
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C.   DATA  SUBLANGUAGES  FOE  RELATIONAL  DATABASES 

The  relational  database  model  must  provide  languages  to 
access  relations.  A  number  of  relational  data  sutlanguages 
(DSLs)  have  been  proposed  and  developed.  Because  of  the 
tabular  structure  of  relations,  users  can  easily  understand 
relational  DSLs.  Another  important  feature  of  a  relational 
ESL  is  its  selective  power.  Relational  DSLs  should  have  the 
capability  to  retrieve  data  that  satisfy  any  condition  over 
any  number  of  relations.   [Eef.  1:p.  36] 

Early  relational  languages  were  based  on  selective 
power.  Codd  [Hef.  12]  gave  the  definition  of  the  relational 
model  in  1970  and  defined  the  basis  for  relational  languages 
such  as  relational  algebra  and  relational  calculus. 
Relational  calculus  has  particular  significance.  It  is  a 
form  of  predicate  calculus  specifically  tailored  to  the 
relational  databases  and  is  used  to  measure  the  selective 
power  of  relational  languages.  A  relational  language  is 
relationally  complete  if  it  can  produce  any  data  that  can  be 
obtained  from  a  relational  calculus  expression.  [Bef.  1:p. 
36] 

Data  Sublanguage  ALPHA,  which  is  based  on  relational 
calculus,  was  presented  by  Codd  £Bef.  12].  DSL  ALPHA  itself 
was  never  implemented,  but  a  language  very  similar  to  it, 
called  QUEL,  was  used  as  the  query  language  in  the  rela- 
tional EE?1S,  called  INGRES  [  Ref .  13].  We  will  discuss 
INGRES  in  more  detail  in  Chapter  VIII. 

Another  widely  used  Data  Sublanguage  is  Structured  Query 
language  (SQL)  which  is  used  for  and  is  currently  imple- 
mented by  the  System  B  relational  database  management  system 
that  runs  on  the  IBM  System/370  [Eef.  14].  SQL  provides 
retrieval  functions  and  a  full  range  of  update  operations, 
and  also  many  other  facilities.  It  can  be  used  both  from  an 
on-line  terminal  and,  in  the  form  of  "embedded  SQL,"  from  an 
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application  program^  batch  or  on-line,  written  in  either 
COBOL  or  PI/I.  [Bef.  3:pp.  145-156]  The  basic  format,  for 
example,    of    SQL   is   in  the  form 

SELECT   <attrihute> 

EEOM  <relaticE> 

WHERE   <conditional   expression> 

Query-by  Example  (QBE)  is  also  an  another  relational 
system  designed  for  users  who  are  not  programmers.  It  has 
approximately  the  same  selective  power  as  SQL  but  uses  a 
graphical  interface-  It  is  therefore  suitable  only  for 
terminal  use  and  cannot  be  embedded  in  a  host  language. 
[Eef.  1:pp.  181-200]  Query-by  Example  is  an  artificial, 
self-contained,    user-directed   specification   language. 

So  far  we  have  examined  several  aspects  of  database 
systems  in  general  and  relational  datatase  model  in  partic- 
ular. But  we  have  net  yet  answered  the  following  question: 
After  having  a  body  of  data  to  be  represented  in  a  datatase, 
how  do  we  decide  what  relations  are  needed  and  what  their 
attributes  should  be?  This  is  the  database  design  problem 
which   will   be   discussed   in  the   next    Chapter. 
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YII.  REIATIONAl  DATABASE  DESIGN 

A.   miBCDDCTIOH 

In  designing  relational  databases,  the  primary  goal  is 
to  ensure  that  relations  represent  the  original  data  speci- 
fications correctly  and  without  redundancy.  The  major 
concept  for  the  relational  database  design  is  the  normaliza- 
tion process,  that  is,  the  process  of  grouping  the  data 
elements  into  relations  representing  entities  and  their 
relationships.  The  idea  of  normalization  is  based  on  the 
observation  that  a  certain  set  of  relations  has  better  prop- 
erties following  the  database  operations,  such  as  inserting, 
updating,  and  deleting,  than  do  other  sets  of  relations 
containing  the  same  data.  In  ether  words,  the  objective  of 
normalization  is  to  produce  a  database  design  that  can  be 
manipulated  in  a  powerful  way  with  a  simple  collection  of 
operations  while  minimizing  update  anomalies  and  data  incon- 
sistencies [Eef.  15:pp.  99-126]-  Normalization  theory  is  a 
useful  aid  in  the  database  design  process,  but  it  is  not  an 
exact  solution. 

S.   NCBHAL  FOBHS 

Normalization  theory  is  traditionally  expressed  through 
a  set  of  so-called  normal  forms  that  progressively  constrain 
the  structure  and  contents  of  a  relation.  A  relation  is  said 
to  be  in  a  particular  normal  form  if  it  satisfies  a  certain 
specified  set  of  constraints. 

There  are  numerous  normal  forms  which  have  been  defined 
by  the  relational  theorists.  As  shown  in  Figure  7. 1 
tBef.  3:p,  239]  each  of  these  normal  forms  contains  the 
other.   If  a  relation,  for  example,   is  in  third  normal  form 
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(3NF)  ,  then  it  is  automatically  in  first  and  second  ncrial 
forms,  Ncne  of  these  normal  forms  will  eliminate  all  anoma- 
lies; each  normal  form  would  eliminate  just  certain  anoma- 
lies. But  R,  Fagin  [Bef.  11]  defined  a  new  normal  form 
called  domain/key  normal  form  (DK/NF) ,  and  he  showed  that  a 
relation  in  DK/NF  is  free  of  all  modification  anomalies, 
regardless  of  their  type.  The  point  is  to  find  ways  tc  put 
relations  in  DK/NF.  If  the  database  designer  does  this,  then 
he  is  guaranteed  that  those  relations  will  have  nc  anoma- 
lies. Unfortunately,  it  is  net  even  known  if  all  relations 
can  be  put  into  DK/NF.  At  this  point  we  need  the  concept  of 
functional  dependency  to  define  these  relational  normal 
forms. 

1 .   Functional  Dependency 

Functional  dependency  (FD)  is  a  term  derived  from 
mathematical  theory;  it  relates  the  dependence  of  values  of 
one  attribute  or  set  of  attributes  on  those  of  another 
attribute  or  set  of  attributes.  Formally,  an  attribute  (or 
set  of  attributes) ,  Y,  in  a  relation  is  said  to  be  function- 
ally dependent  on  another  attribute  (or  set  of  attributes) , 
X,  if  knowing  the  value  of  X  is  sufficient  to  determine  the 
value  of  Y-  To  put  it  another  way,  there  is  only  one  value 
of  Y  associated  with  any  value  of  X.  The  notation  X — >Y  is 
often  used  to  denote  that  Y  is  functionally  dependent  on  X, 
and  is  read:  X  functionally  determines  Y.  The  attribute  (or 
set  of  attributes)  X  is  known  as  the  determinant  of  the  FD 
X — >Y .  It  is  obvious  that  the  nonkey  attributes  of  any  rela- 
tion are  functionally  dependent  on  the  key. 

To  illustrate  the  basic  principles  of  functional 
dependencies,  consider  the  sample  database  in  Figure  6.  1  . 
The  attribute  TITLE  in  relation  COURSES  is  functionally 
dependent  on  CCODE  because  each  course  has  one  given  title 
value.   Thus  once  a  course  code  is  known,   a  unique  value  of 
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Universe  of  relations  (normalized  and  unnormalized) 
First  Normal  Fcrm  (INF) 
Second  Normal  Form  (2NF) 
Third  Normal  Form  (3NF) 

Boyce-Ccdd  Normal  Form  (3CNF) 
Fourth  Normal  Form  (4NF) 

Fifth  Normal  Form  {5NF)      i 
I  Dcmain/Key  Normal  Form   |  I 


Figure  7,  1    Relational  Normal  Forms, 

course  title  is  immediately  determined.  The  FD  for  this 
exami^le  is  shown  as  CCODE  — >  TITL2. 

Likewise,  ir  relation  CO0RSE_ATTENDED,  once  values 
for  officer  ID  (MID)  and  CCODE  are  known,  a  unique  value  of 
GRADE  for  that  officer  in  that  course  is  determined.  This  FD 
is  defined  as  MID,CCOCE  — >  GRADE. 

It  is  convenient  to  represent  the  FDs  in  a  giver  set 
of  relations  by  means  of  a  functional  dependency  diagram,  an 
example  of  which  is  shown  in  Figure  7.2  It  is  also  possible 
to  have  two  attributes  that  are  functionally  dependent  on 
each  other.  In  this  case  both  CCODE  — >  TITLE  and  TITLE  — > 
CCODE    hold   (because   TITLE  is   an  alternate   key  for   the 
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relation   CODESES) .    The   notation   CCODE   < > 

commonly  used  to  illustrate  such  mutual  FD. 


TITLE   is 


CrriCEE: 


•>1  RANK  I 


aiD  I >|  NAME  I 

>j  CITY  1 

COURSES: 


1  CCODE  1— > 


COOESE  ATTENDED: 


>i  GRADE  ] 


>1    CDESCEIPT 


j  TITLE  !■ 


>1 


•>1  DORATION 


>|  LOCATION   I 


Figure  7.2    Functional  Dependency  Diagrams. 

We  also  need  to  introduce  the  concept  of  full  func- 
tional dependency.  Ihis  term  is  used  to  show  the  minimum 
set  of  attributes  in  a  determinant  of  an  FD.  [Ref.  1:pp. 
15-48]  Attribute  (or  set  of  attributes)  Y  is  said  to  be 
fully  functionally  dependent  oe  attribute  (or  set  of  attri- 
butes) X  if  Y  is  functionally  dependent  on  X  and  Y  is  not 
functionally  dependent  on  any  proper  subset  of  X.  For 
example,   in   the  relation   CODRSE_ATTENDED,   the   attribute 
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GRADE  is  fully  functionally  dependent  on  the  attritutes 
(MID,  CCODE)  because  it  is  not  functionally  dependent  on 
either  MID  or  CCODE  alone.  On  the  other  hand,  in  the  rela- 
tion COUESES,  the  attribute  CDESCRIPT  is  functionally  depen- 
dent en  the  attributes  (CCODE,  TITLE);  however,  it  is  not 
fully  functionally  dependent  on  those  attributes  because,  it 
is  also  functionally  dependent  on  either  CCODE  or  TITLE 
alone. 

2«   First,  Second,  Third,  and  Boyce-Codd  Normal  Forms 

First  normal  form  (INF)  deals  with  the  "shape"  of  a 
record  tjpe  or  a  tuple.  Onder  first  normal  form,  all  tuples 
in  a  relation  must  have  the  same  set  of  attributes,  and  the 
attritutes  must  be  atomic  (indivisible  items) .  This  defini- 
tion merely  states  that  any  normalized  relation  is  in  first 
normal  form. 

When  determining  whether  a  particular  relation  is  in 
normal  form,  the  FDs  between  the  attributes  in  the  relation 
must  te  examined.  Per  this  reason,  we  will  use  a  notation 
which  was  first  proposed  by  £Eef.  16]  to  point  out  these 
relational  characteristics.  In  the  notation,  the  relation  is 
defined  as  divided  into  two  components:  the  attributes  and 
the  FEs  between  them.  The  format  is 

E  =  ({X,Y,Z}  ,{X— >Y,X— >Z}) 

E  is  the  name  of  the  relation,  'A,  Y,  and  Z  are  the  attri- 
butes, and  X — >Y,  X — >Z  are  FDs.  For  example,  in  Figure  6.  1 
the    relation   COUHSE_ATTENDED   is    defined    as 

CODESE_ATTENDED=  (  {MID, CCODE , GRADE} ,  {HID, CCODE — >GEADF}  ) 

Many  update  and  deletion  anomalies  can  be  eliminated 
by  converting  a  relation  to  second  normal  form  (2NF).  Second 
normal  form  requires  that  all  nonkey  attributes  must  contain 
informaticn   that  refers   to   the   entire   key,      not    just    part   of 
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it.  lE  ether  words,  a  relation  is  said  to  be  in  2NF  if  and 
only  if  it  is  in  INF  and  every  nonkey  attribute  of  the  rela- 
tion is  fully  functionally  dependent  on  the  primary  key.  The 
relation  ONIT_ASSIGNZr,  for  example,  in  Figure  7.3  is  in  INF 
but  not  in  2NF  because  the  '■  nonkey  attribute  GSTATUS 
(geographical  status)  is  not  fully  dependent  on  the  primary 
key  DCODE  (unit  code)  and  MID.  Here  GSTATDS  is  fully  func- 
tionally dependent  on  UCODE,  which  is  a  subset  of  the 
primary  key. 


I 

delation: 

.     UNIT_A£SIGNED         Key, 

:     OCODE,MID 

UCODE 

LOCATICN 

GSTATDS    1 

MID 

DATE 

D1 

Monterey 

100 

ID1 

012583 

CI 

U1 

Monterey 

100 

ID4 

0U2385 

Monterey 

100 

IDS 

012S81 

D1 

Monterey 

100 

ID  2 

110182 

02 

Newyork 

20  0 

ID6 

083084 

D3 

Denver 

300 

ID3 

072882    1 

D3 

Denver 

30  0 

IDS 

100S84 

04 

Newyork 

20  0 

ID1 

031584 

Figure  7.3    Eelation  in  1NF  but  not  in  2NF. 

In  Figure  7.4  relation  DNIT_ASSIGNED  has  been  decoE- 
posed  into  two  relations,  UNITS  and  ASSIGNMENT.  Both  rela- 
tions are  in  2NF.  Note  that  the  relation  UNIT_ASSIGNED 
suffers  from  modification  anomalies  with  respect  to  ufdate 
operations.  Figure  7.5  also  illustrates  the  FDs  for  both 
relations. 

Problems  occur  with  each  of  the  following  three 
basic  operations. 
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UNITS 

key:  OCODE 

DCCDE 

lOCATION 

GSIATOS 

01 

Monterey 

100 

U2 

Newyork 

200 

D3 

Denver 

300 

04 

Newyork  |   200   | 

ASSIGNMENT   key:  MID,OCOCE 
MID    OCODE     DATE 


ID1 
ID1 
ID2 

IDS 
ID  4 
IDS 
IDS 
ID6 


01 

04 
01 
03 
01 
01 
03 
02 


012583 
031584 
1 10182 
072882 
042385 
012581 
100584 
083084 


Figure  7-4    Relations  in  21iIF. 


DNITS: 


->|  LOCATION  I 


1  OCODE  j 


•>1  GSTATCS  I 


ASSIGNMENT: 

1  MID   1 

1  OCODE 1 

>|  DATE  1 


Figure  7.5    FD  Eiagrams  for  ONITS  and  ASSIGNMENT. 

Inserting:  We  cannot  enter  the  fact  that  a  particular  unit 
is  located  in  a  particular  city  until  at  least  one  officer 
is  assigned  to  that  unit. 
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I 

]CL     K€ 

UCOEE 

sy:  UCODE 
LOCATION 

I 

JLG    Key: 
LOCATION 

LOCATION 
GSTATUS 

D1 

Monterey 

Denver 

300 

D2 

Newyork 

Monterey 

100 

03 

Denver 

Newyork 

200 

04 

Newyork 

...  _     ..__.,     ,  1 

Figure  7.6        Relations   in   3NF, 

The  original  definition  of  3NF  was  subsequently 
replaced  by  a  stronger  definition  known  as  Boyce/Codd  Noraial 
Form.  (BCNF)    which    can   be   defined   as    follows- 

"A  relation  R  is  in  Boyce/Codd  Normal  Form  (BCNF)  if  and 
only  if  every  determinant  is  a  candidate  key." 
[Ref,    3:p.    249] 

The  original  3NF  definition  does  not  satisfactorily  handle 
the  case  of  a  relation  that  has  more  than  one  candidate  key, 
and  modification  ancEalies  arise  with  this  definition  when 
it  is  used  with  such  relations.  BCNF  is  often  used  to  remove 
these      anonalies.         Fcr      example,  consider     the      relation 

DNIT_ASSIGNFD  (Fig.  7.3  )  and  the  FDs  between  the  attributes 
of   that   relation   such  as 

ONIT_ASSIGNED= ( {UCCDE, LOCATION, GSTATUS, MID, DA TE} , 
{UCODE — >LOCATION,    UCODE — >GSTATUS,    LOC ATION-->GSTATDS , 
UCODE, MID — >DATE} ) 

Here  the  relation  UNIT_ASSIGNED  contains  three  determinants 
but  only  the  determinant  (UCODE,  MID)  is  a  candidate  key. 
Therefore  DNIT_ASSIGNED  is  not  BCNF.  Similarly,  UNITS  (Fig. 
7.4  )  is  not  BCNF,  because  the  determinant  LOCATION  is  cot  a 
candidate    key.    On    the   other    hand,    relations    ASSIGNMENT,    UCL, 
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and   UIG    are      each    BCSI ,      because   in    each      case   the   candidate 
key   is   the   only   determinant    in    the  relation. 

BCNF  is  conceptually  simpler  than  3NF  since  it  does 
not  reference  the  concepts  of  primary  key,  transitive  depen- 
dency, and  full  dependency.  Although  BCNF  is  stronger  than 
3NF,  it  is  still  true  that  any  relation  can  be  decomposed  in 
a  nonloss  way  into  an  equivalent  collection  of  BCNF 
relations. 

-  •      Forth   and    Fifth   Normal    Forms 

Forth  and  fifth  normal  forms  deal  with  multivalued 
attributes.  A  multivalued  attribute  may  correspond  to  a 
many-tc-many  relationship,  as  with  officers  and  skills,  or 
to  a  many-to-one  relationship,  as  with  the  children  cf  an 
officer.  By  "many-to-many"  we  mean  that  an  officer  may  have 
several  skills  and/or  a  skill  may  belong  to  several  offi- 
cers. When  we  look  at  the  many-to-one  relationship  between 
children  and  fathers,  it  is  a  single- valued  fact  about  a 
child  tut  a  multivalued  fact  about  a  father.  In  some  sense, 
UNF  and  5NF  are  also  related  with  composite  keys.  These 
normal  forms  attempt  to  minimize  the  number  of  attributes 
involved  in  a   composite   key.       [Eef.     17] 

Forth  normal  form  is  based  on  the  concept  of  multi- 
valued dependency  (M7D)  .  The  notation  X — »Y  is  used  to 
indicate  that  a  set  of  attribute  Y  is  multidependent  on  a 
set  of  attributes  of  J.  Formally,  iiVD  is  defined  as  follows: 
Given  a  relation  R  with  attributes  X,  Y,  and  Z,  the  multiva- 
lued dependency  X — >>Y  holds  in  R  if  and  only  if  the  set  of 
Y-values  matching  a  given  (X-value,  Z-value)  pair  in  R 
depends  only  on  the  X-value  and  is  independent  of  the 
Z-value.  The  attributes  X,  Y,  and  Z  may  be  composite. 
[Ref.    3:pp.    237-265] 

Multivalued  dependencies  which  have  been  defined  can 
exist    only    if    the   relation    R      has   at    least    three    attributes. 
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It  is  easy  to  show  tiat,  in  the  relation  R(X,Y,Z),  the  MVD 
X — >>Y  holds  if  and  only  if  the  MVD  A — >>C  holds. 

Before  giving  the  definition  of  4NF,  it  is  conven- 
ient to  state  the  following  theorem  proved  by  Fagin  in 
[Eef,  18]. 

"Relation  E,  with  attributes  X,  Y,  and  Z,  can  be 
nonloss-decomposed  into  its  two  projections  R1 (X,Y)  and 
R2(X,Z)  if  and  only  if  the  MVD  X-->>Y,Z  holds  in  E." 

Now  fourth  normal  fori  (4NF)  is  defined  as  follows: 

"A  relation  R  is  in  fourth  normal  form  (4NF)  if  and  only 
if.  whenever  there  exists  an  MVD  in  E,  say  X — >>Y,  then 
all  attributes  of  R  are  also  functionally  dependent  on  X 
(i.e.,  X — >Z  for  all  attributes  Z  of  R)."  [Ref.  3:p. 
259] 

Fagin  also  proves  (see  [Ref.  18]  )  that  UNF  is  strictly 
stronger  than  3CNF  (i.e.,  any  4NF  relation  is  necessarily  in 
BCNF)  ,  and  any  relation  can  be  nonloss-decomposed  into  an 
equivalent  collection  of  4NF  relations. 

Fifth  normal  form  (5UF)  deals  with  cases  where 
information  can  be  reconstructed  from  smaller  pieces  of 
information  which  can  be  maintained  with  less  redundancy. 
2NF ,  3NF,  and  4NF  also  serve  this  purpose,  but  5NF  general- 
izes to  cases  not  covered  by  the  others.  Aho  and  co-workers 
in  1979  [Bef.  19]  discovered  relations  that  cannot  be 
nonlosslessly  decomposed  into  two  relations  but  can  be  loss- 
lessly  decomposed  into  three  or  more  relations.  Because  of 
this  property,  5NF  is  also  called  pro jection- join  normal 
form,  and  is  based  on  the  concept  of  join  dependency  (JD) 
which  is  a  more  general  case  of  an  MVD.  In  general,  relation 
R  satisfies  the  JD  *(X,Y,..,,Z)  if  and  only  if  it  is  the 
join  of  its  projections  on  X,Y,...,Z,  where  X,Y,...,Z  are 
subsets  of  the  set  of  attributes  of  E  [Ref.  3:pp-  237-265]. 
We  can  now  define  5NF  given  by  [Eef.  3:p.  262]. 

"A  relation  R  is  in  fifth  normal  form  (5NF)-also  called 

pro jection- join   normal  form   (PJ/NF) -  if   and  only   if 

every  join  dependency  in  R  is  implied  by  the  candidate 
keys  of  R." 
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since  a  JD  is  a  more  general  case  of  an  MVD,  any  relation 
which  is  in  5NF  is  necessarily  in  4NF.  But  determining  that 
a  relation  is  in  5 NI  is  less  straight-forward  than  4NF, 
BCNF,  etc.  because  discovering  join  dependencies  is  a 
nontrivial  task. 

4 .   jComain  l_   Ke_y  tiormal  Form 

In  1981,  R.  Fagin  [ Bef .  11]  defined  a  new  ncrmal 
form  called  domain/key  normal  form  (DK/NF) .  In  his  paper  he 
proved  that  a  relation  in  DK/NF  will  have  no  insertion  or 
deletion  anomalies.  He  also  showed  that  a  relation  having  no 
modification  anomalies  must  be  in  DK/NF.  DK/NF  is  based  on 
only  the  concepts  of  key  and  domain.  These  concepts  are 
readily  known  and  supported  by  DBMS  products.  The  definition 
of  DK/NF  is  quite  simple. 

"A  relation  is  in  EK/NF  if  every  constraint  on  the  rela- 
tion is  a  logical  ccnsecuence  of  the  definition  of  keys 
and  domains."   [Bef.  6:p.  299] 

In  this  definition,  constraint  is  a  broad  term.  Any  rule  on 
static  values  of  attributes  that  can  be  evaluated  precisely 
whether  or  not  it  is  true  is  said  to  be  a  constraint.  Thus 
FDs,  MVDs,  JDs,  and  edit  rules  are  all  examples  of 
constraints.  Some  constraints  which  have  to  do  with  changes 
in  data  values  are  excluded  from  the  definition  of 
constraint. 

DK/NF  relaticn  requires  that  if  keys  and  domains  can 
te  defined  such  that  all  constraints  will  be  satisfied  when 
the  key  and  domain  definitions  are  satisfied,  then  modifica- 
tion ancEoalies  are  impossible.  But  there  is  no  known  way  to 
put  a  relation  in  EF/NF  automatically.  In  spite  of  this 
problem,  DK/NF  can  te  extremely  useful  for  practical  data- 
base design.  DK/NF  is  a  design  objective.  Database  designers 
wish  to  define  their  relations  such  that  constraints  are 
logical  consequences  of  domains  and  keys.  This  goal  can  be 
accomplished  for  many  designs. 
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Seven   normal   forms   have  been   discussed   and   are 
summarized  in  Figure  7.7  [fief-  6:p-  305]. 


Form      Defining  Characteristic 

INF      Any  relation. 

2NF      All  nonkey  attributes  are  dependent 
on  all  of  the  keys. 

3NF      Ther€  are  no  transitive  dependencies. 

^NF      Every  MVD  is  a  functional  dependency. 

5NF      Join  dependencies  are  satisfied. 

DK/NF      All  constraints  on  relations  are  logical 
consequences  of  domains  and  keys. 


Figure  7-7   Summary  of  Normal  Forms. 


C.   EFIATIOHAL  DESIGN  PBOCEDOEES  AND  CHITEEIA 

'^  •   Design  Procedures 

Ihe  relational  model  is  attractive  in  database 
design  since  it  provides  formal  criteria  for  logical  struc- 
ture, namely,  normal  form  relations.  In  order  to  produce 
those  relations,  database  designers  should  choose  a  design 
procedure.  Two  different  approaches  have  been  proposed: 

" 1  -  Decomposition  procedures.  These  commence  with  a  set 
of  one  or  more  relations  and  decompose  nonnormal  rela- 
tions in  this  set  into  normal  forms- 

2. Synthesis  procedures.  These  commence  with  a  set  of 
functional  dependencies  and  use  them  to  construct  normal 
form  relations."   £Eef.  1 : p.  59] 

In  practical  situations,  synthesis  procedures  are  mere 
attractive  than  decomposition  procedures.  Many  algorithms 
have  teen  proposed  for  relational   design  and  each  algorithm 
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produces  relations  that  satisfy  some  subset  of  the  rela- 
tional design  criteria  which  will  be  discussed  in  the  next 
Section. 

Eecomposition  algorithms  start  with  one  relation  and 
successively  decompose  it  into  normal  form  relations.  The 
relations  in  3NF  and  BCNF  are  not  sufficient  for  applying 
these  decomposition  algorithms,  so  the  ideas  of  MVD  and  4NF 
have  to  be  known.  Synthesis  algorithms,  on  the  other  hand, 
start  with  a  set  of  lEs  and  synthesize  them  into  normal  form 
relations.  In  other  words,  these  algorithms  use  FDs  to 
produce  normal  form  relations.  Detail  information  about 
design  algorithms  can  be  found  in  [ Ref .  1:pp-  59-88]. 

2 •   Relational  Database  Design  Criteria 

Ihis  Section  presents  several  different  design 
criteria  which  have  been  identified  in  [Ref-  6:pp.  307-311] 
and  [Bef.  16]  for  producing  an  effective  relational 
database. 

a. Elimination  of  Modification  Anomalies 

The  objective  of  this  criterion  is  to  eliminate 
all  anomalies  resulting  from  database  operations .  As  we  have 
seen,  if  relations  are  in  DK/NF,  then  no  modification  anoma- 
lies can  occur.  This  is  why  DK/NF  is  a  design  objective.  The 
problem  is  to  find  a  way  that  all  relations  can  be  put  into 
DK/NF. 

b. Relation  Independence 

According  to  this  design  goal,  two  relations  are 
said  to  be  independent  if  modifications  can  be  made  to  one 
without  regard  for  the  other.  However,  this  criterion  is  not 
always  achievable.  Interrelation  constraints  allow  relations 
to  be  dependent.  To  eliminate  this  dependency  the  relations 
can  be  joined  together.  After  the  join  operation,  the  new 
relation  may  have  modification  anomalies.  To  eliminate  these 
anomalies,    relations  are   decomposed   into    two  or   more 
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relations;  but  this  cperation  creates  interrelation  depen- 
dencies again.  Here  we  see  the  conflict  in  design  goals.  In 
this  case  we  must  choose  the  least  of  the  evils,  based  on 
the  reguirements  of  the  application. 

c.  Ease  of  Use 

This  third  criterion  for  a  relational  database 
design  makes  the  relations  seem  natural  to  users-  As  far  as 
possible,  designers  should  attempt  to  structure  the  rela- 
tions so  that  they  are  familiar  to  users.  Prom  time  to  time 
this  criterion  conflicts  with  the  other  two  criteria. 

d. Bepresentation 

This  relational  criterion  states  that  the  final 
structure  has  to  correctly  represent  the  original  specifica- 
tions. That  is,  all  the  relations  in  the  output  design 
process  must  satisfy  the  conditions  for  normal  form.  C. 
Beeri  and  co-workers  have  defined  three  important  points  for 
the  representation  of  a  set  of  relations.  Sin,  in  the  input 
design  process  by  a  set  of  relations,  Sout,  in  the  output 
design  process  (Sin  ard  Sout  are  sets  of  relations  used  in 
the  input  and  output  design  processess) : 

"-EEP1:  The  relaticns  Sout  contain  the  same  attributes 
as  Sin. 

-EEP2:  The  relaticns  Sout  contain  the  same  attributes 
and  the  sane  FDs  as  Sin. 

-EEP3:  The  relaticns  Sout  contain  the  same  attributes 
and  the  sane  data  as  Sin."   [Ref-  1:p.  63] 

The  first  representation,  REP1,  requires  all  the 
attributes  in  Sin  to  also  be  in  the  relations  in  Sout,  But 
it  does  not  address  any  dependencies  between  the  attributes. 

In  regard  to  EEP2,  representation  requires  that  each 
ED  in  Sin  be  either  contained  as  an  FD  in  one  of  the  rela- 
tions in  Sout  or  derived  from  the  FDs  in  the  relations  in 
Sout,  using  the  FD  inference  rules. 
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The  third  representation  criterion,  P.EP3,  also 
requires  that  the  relations  in  Sout  contain  exactly  the  same 
tuples  as  the  original  relations  in  Sin. 

e. Separation 

The  separation  criterion  means  that  the  original 
specifications  are  separated  into  relations  that  satisfy 
certair  conditions.  As  we  have  discussed  earlier  in  this 
Chapter,  the  database  must  te  divided  into  a  Dumher  of 
normal  form  relations, 
f . Eedundancy 

This  last  criterion  points  out  the  fact  that  the 
final  structure  must  not  contain  any  redundant  information. 
It  is  possible  to  define  the  redundancy  criterion  in 
different  ways.  One  set  of  redundancy  criteria  is  shewn 
telow  : 

"-BED1:  A  relation  in  Sout  is  redundant  if  its  attri- 
butes are  contained  in  the  other  relations  in 
Sout. 

-EED2:  A  relation  in  Sout  is  redundant  if  its  FDs  are 
the  same  or  can  be  derived  from  the  FDs  in  the 
other  relations  in  Sout- 

-EED3:  A  relation  in  Sout  is  redundant  if  its  content 
can  be  derived  from  the  contents  of  other  rela- 
tions in  Sout."   [Ref-  1:p.  66] 

Here,  EED1  is  not  a  very  useful  idea,  because  during  deccm- 
position  it  is  often  necessary  to  create  separate  relations 
that  represent  FDs  between  attributes,  which  may  appear  in 
other  relations-  EEr2  and  RED3,  however,  can  be  very  useful 
criteria.  Any  design  algorithms  should  avoid  BEDS  because  it 
would  keep  the  same  data  in  more  than  one  relation. 

The  design  criteria  discussed  in  this  Section  can 
conflict.  When  conflicts  occur,  the  designer  has  to  assess 
priorities  and  make  the  best  possible  compromise  in  light  of 
requirements.  There  is  no  single  rule  of  priority. 
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D.   TEANSPOHaiNG  THE  SDM  INTO  EEIATIONAL  MODEL 

Figure  7.8  illustrates  the  logical  design  of  the 
PersoDnel  database  (see  SDM  in  Chapter  V).  This  logical 
schema  cannot  be  used  to  implement  the  relational  personnel 
database  for  the  following  reasons: 

1 .  Most  of  the  relations  have  multivalued  attributes. 
Such  attributes  cannot  be  used  in  a  relation, 

2.  The  logical  schema  allows  some  tuples  to  be  contained 
in  other  tuples.  The  relations  must  be  normalized  or 
redefined  to  eliminate  these  inconsistencies. 

As  shown  in  Figure  7.8,  inversion,  Hatching,  and  deriva- 
tion have  been  used  to  provide  interrelationships  between 
the  attributes.  Inverse  and  match  functions  must  be  elimi- 
nated in  order  to  achieve  DK/MF.  During  this  process,  the 
new  interrelation  constraints  should  be  added. 

Initially,  the  relationships  between  OFFICER  and 
ACADIMIC_ED0CATION,  MILIT ARY_EDUCATION, M EDICAL_INFO,  and 
FOREIGN_IANGUAGE  were  assumed  as  one-to-many.  For  example, 
an  officer  can  have  mere  than  one  medical  report,  and  many 
reports  may  belong  to  one  officer.  Such  relationships  were 
descrihed  by  match  function  in  the  SDM  design.  On  the  other 
hand,  relations  OFFICER  and  UNIT  have  many-to-many  relation- 
ships with  each  other,  and  these  relationships  were  defined 
by  the  inverse  function  in  the  same  SDM.  The  relationship, 
for  example,  between  unit_assigned  and  of f icer_assigned  is 
many-tc-many.  To  eliminate  this  problem,  a  new  relation 
called  ASSIGNMENT  has  been  constructed. 

Ey  considering  all  those  conditions,  rules,  design 
criteria,  etc.  described  in  this  Chapter,  the  resulting 
relational  design  (relational  schema)  is  illustrated  in 
Figure  7.9,  and  domain  definitions  with  attribute/domain 
correspondences  are  shown  in  Figure  7.10  and  Figure  7.11 
respectively.  For  simplicity,  some  attributes  are  removed  in 
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omCEE    (Military    ID,    Bank,    Date_of_cromotion,    Name, 
Birth_da"t€,    Beginnirig   date_to_active_duty. 
Native   ccuntry.    Sex,   ■Harital_status,    Num£)er_ 
of   children.    Permanent    address.    Current_ 
address,    ?rimary_tirancE,    Secondary   branch, 
Academic_education ,    Military_educa'Eion, 
Health_ccndition ,Foreign_lan  guage_capanility, 
Unit_assicned) 

Key:    Military _ID 
Notes:    1-    Academic  education    is   a   contained 
ACADEMIC~MAJOE   tuple,    multivalued. 
2.    Military   education   is   a  contained 
MILIT5iiY~ED0  CAT  ION/COURSES    tuple, 
multivalued. 
3-    Health  condition   is    a    contained   ME- 
DICAL  INFO   tuple,    multivalued. 

4.  Foreign_languaqe   capability   is  a 
contained    FOESlG"N_LANGaAGE   tuple, 
multivalued. 

5.  anit_a£siqned   is   a   contained   UNIT 
tuple,    multivalued. 

UNIT     (Dnit_code,    Name,    Unit_category,    Location, 

Superior_unit,    unit_f unction.    Of f icer_assigned) 

Key:    Unit_code 
Note:      0*f icer_a£signed   is    a   contained  OFFICER 
tuple,    multivalued. 

ACArEMIC_MAJCR    ( Academic_br anch,     Academic_degre€,    AID, 

Date,    Name_of_university) 

Key  :     (Acad€mic_br an ch,Academic_ degree,  AID) 

MILITARI_EDDC ATI OK/COURSES     {Course/Mili tary_school_ 
code.    Location,    MEID,    Course/School_title, 
Duration,    Date,    Grade) 

Key:     (Course/Military    school    code.    Location, 
MEID) 

MEEICAL_INFO (Medical  report    number,    HID-    Date,    Height, 
Weight,    BlccH   pressure.    Eye   condition.    Ear 
condition.    Internal,    General_health_statusy 


Key:     (Medical_report_number,    HID) 

LANGUAGE     (Name   of_language, 
capability) 

Key:     (Nam€_of_language,    FID) 


FOEEIGN_LANGUAGE     (Name   of_language,    FID,    Degree_of_ 

capability) 


ASSIGNM 


Key:  (Unit_code,  Reguest_number) 


Figure  7.8    Summary  of  Logical  Design. 

99 


the  relational  schema  in  Fig.  7.9,  and  they  are  referred  to 
the  attribute  OTHERS.  A  sample  of  the  designed  database  with 
example  data  is  given  in  Appendix  B. 

In  order  to  be  familiar  with  some  currently  available 
EBMSs,  the  INGEES  lEMS  will  be  introduced  in  Chapter  VIII 
and  the  sample  personnel  database  in  Appendix  B  will  also  be 
implemented  by  using  another  CEMS  known  as  ORACLE  in  Chapter 
IX- 


Olf ICEE  (MID, Rank, Name, Sex, Pri_bran, Sec_bran, Others) 

Key:  MID 
UNIT  (Ucode,Uname,Ucat,Dloc,Sup_unit,Uf unc) 

Key;    Ocode 
A_ EDUCATION     (Abran, Adeg, AID ,Dniv, Gdate) 

Key:     (Abran, Adeg, AID) 
M__ICUCATION    (Ccod€,Cloc,HSID,Cgrade,Cdate) 

Key:     (Ccode, Cloc, MEID) 
M_COUFSES    (Ccode, Cloc, Ctitl€,Cdesc,Cdur) 

Key :     (Ccode , Cloc) 
MEDICAL     (Eepno,HIL,Edate, Eyecond, Earcond, Hstat, Others) 

Key:     (Eepno,HID) 
LANGUAGE    (Nlanguage, FID, L degree) 

Key:    Nlanguage 

ASGREG    (E_ucode,R€gnum,Begdate,R    rank, R_pribr , 

R_secbr,R_acabr,R_miled,I?_hstat  ,Numof  pers) 

Key:     (E_ucode,Regnum) 

ASSIGNMENT     (AMID , A_ucode , Or derno, Asgdate) 

Key:     (AMID, A_ucode, Asgdate) 


Figure  7.9    Belational  Schema  for  Personnel  Database, 
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DOMAIN    NAME  FOBMAT   and    MEANING 


MID  numeric    999999999 

BANK  CHAB(4);    abbreviations  of    military 

ranks  for    the  army 
PEBSON_NAHES  CHAH(20);    names    of   officers 

SEX  CHAE(1);    value   is    'M«    or    'F' 

EBANCHES  CHAE(8);  abbreviations  of  military 

branches  for  the  army 
DNIT_CODE  CHAB(6);  unit  codes 

DBI'I_NAMES        CHAR  (15);  names  of  units 

DNII_CAT  CHAR (4);  unit  categories 

( *div' , ' hosp' ,  etc.) 
LOCATION  CHAR (15);  names  of  locations 

ONII^FUNC         CHAR (6) ;  unit  functions 

ACADEMIC  BRANCHES  CHAR  (5)  ;  abbreviations  of 

^  academic  branches 

ACADEMIC  DEGREES   CHAR(3);  value  is  » BA» , ' B S' , » DB « , 

»MA','i1S»,  or  'ENG* 
DNI7ERSITY_NAMES  CHAR(IO);  names  of  universities 

DAIE  CHAR (9);  format  is  DD-MMM-YY 

CODESE/SCH_CODE   CHAR (6);  course  or  school  codes 

C0UESE/SCH_TIT1ES  CHAR (10);  titles  of  courses 

COUBSE/SCH_DESC-   CHAR (30);  description  of  courses 

CODBSE  GRADES      CHAR  (2);  value  is  ' A • , ' A- » , » B+ • , 

»B» , »B-»     'C+', 'C», 'C-', 
•D«,'F'/P»,    or    'X' 

EYE_EAR_CONDITICN   numeric    99;    codes   for  eyes   and 

ears 
REAITH_STATUS  numeric    99 

lANGOAGES  CHAR (10);    names    cf    foreign 

languages 
LAWGDAGE_CAP.  numeric    9 

CRDEE_NO  CHAE(8);  format  is  »999999-9' 

CIHEES  subclass  of  STRINGS  where 

specified 
INTEGEES  numeric  values  where  specified 


Figure  7.10   Domain  Definitions. 


10  1 


ATTEIEDTE 

DOMAIN 

AID^AMID-FID, 
HIE,«ID,&EID 

MID 

EANK,E_EANK 

RANK 

CNAME 

PEESON_NAMES 

SEX 

SEX 

FBI    BEANIE    PEIBE, 
SECIEEAN^R-SECBE 

BEANCHES 

A    OCODE-R    UCODE, 
DCCIDE^SUPIONIT 

UNIT_CODE 

DNAWE 

aNII_NAMES 

acAi 

aNIT_CAT 

DLOC,C10CB 

LOCATION 

DFONC 

ONIT_FUNC 

ABEAN,E_ACABE 

ACADEMIC_BEANCHES 

ADEG 

ACAEEMIC_DEGREES 

DNIV 

aNIVEESITY_NAMES 

ASGIATE-CDATE^GrATE, 
BEATE^EEQDATE 

DATE 

CCOEEA-CCODEE, 
E_MILED 

CODES E/3CH_C0DE 

CTITLE 

COUFSE/SCH_TITLES 

CDESC 

COUESE/SCH_DESCEIPTION 

CGEADE 

COOESE_GRADES 

CEDE,NUMOFPEES 

INTEGEES 

CBDEENO.REPNO, 
EECNDM 

OEDEE„NO 

EYECOND,EAECOND 

EYE_EAE_CONDITION 

HSTAT,RHSTAT 

HEAITH_STATUS 

NIANGDAGS 

LANGUAGES 

IDEGREE 

LANGDAGE_CA?ABI LITY 

CIHEES 

OTHERS 

Figure  7.11    Domains  and  Attributes  for  Personnel  Dataiase. 
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VIII.  INGRES  -  1  RELATIONAL  DATABASE  SYSTEM 

A.  IHTEODDCTIOH 

INGRES  (Interactive  Graphics  and  Betrieval  System)  is  a 
relational  database  and  graphics  system  which  is  implemented 
on  top  of  the  UNIX  operating  system  developed  at  Bell 
Telephone  Laboratories.  The  implementation  of  INGRES  is 
primarily  programmed  in  "C",  a  high  level  language  in  which 
UNIX  itself  is  written.  Parsing  is  done  by  using  YACC,  a 
compiler-compiler  available  on  UNIX. 

INGRES  runs  as  a  normal  user  job  on  top  of  the  UNIX 
operating  system.  The  primary  significant  modification  to 
UNIX  that  INGRES  requires  is  a  substantial  increase  in  the 
maximum  file  size  allowed. 

In  this  chapter  we  shall  describe  some  of  the  principal 
components  of  INGRES.  These  include  the  query  language  QUIZ, 
INGRES  utility  commands,  and  the  storage  structures 
supported. 

B.  QUEL:  A  EELATIOHAI  QOERY  LANGUAGE 

QUEL  (QUEry  Language)  is  a  calculus  based  language.  Each 
interaction   of    QUEL    contains   one   or    more   range-statements   of 

the    form 

RANGE   OF   variable_list    IS   relation_name 

The  purpose  of  this  statement  is  to  specify  the  relation 
ever  which  each  variable  ranges.  The  variable_list  portion 
of  a  RANGE  statement  declares  variables  which  will  b€  used 
as    arguments  for    tuples.    These   are  called    tuple    variables . 

Each  QUEL  interaction  also  includes  one  or  more  state- 
ments  of    the   form 
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Ccmmand  [ result_name ](target_list) 
[WHEEE    Qualification] 

Here  commaiid  is  either  RETRIEVE, APPEND,  REPLACE,  or 
DELETE.  We  use  square  brackets  ([  ])  to  denote  "zero  or 
more".  lor  RETRIEVE  and  APPEND,  result_name  is  the  name  of 
the  relation  which  qualifying  tuples  will  be  retrieved  into 
cr  appended  to.  For  REPLACE  and  DELETE,  result_name  is  the 
name  of  a  tuple  variable  which  identifies  tuples  to  be  modi- 
fied  or   deleted.    The   target_list   is   a  list   of   the   form 

result_dofflain   =   QUEL  Function    ... 

Here  the  result_dcmains  are  domain  names  in  the  result 
relation  which  are  to  be  assigned  the  values  of  corre- 
sponding functions. 

The  goal  of  a  query  is  to  create  a  new  relation  for  each 
RETRIEVE  statement.  The  relation  so  created  is  named  by  the 
"result_naflie"  clause  and  the  domains  in  that  relation  are 
named  by  the  "result_domain"  names  given  in  the  target_list. 
The  result_domain  name  may  be  omitted  and  is  then  taken  to 
be  the  same  as  the  Dcmain_name  in  the  function.  The  result- 
name  is  an  optional  parameter  to  designate  that  the  table 
returned  by  the  query  be  permanently  stored  in  the  database 
with  the  result_name  as  its  identifier.  Retrievals  that 
specify  a  result_name  do  not  display  the  result  table  on  the 
terminal  screen.  The  result_name  cannot  be  the  name  cf  an 
existing   table. 

To  create  the  desired  relation,  first  consider  the 
product  of  the  ranges  of  all  variables  which  appear  in  the 
target_Jist  and  the  qualification  of  the  RETREIVE  statement. 
Each  term  in  the  target_list  is  a  function  and  the 
Qualification  is  a  truth  function,  i.e.;  a  function  with 
values  true  or  false,  on  the  product  space.  The  desired 
relation  is      created    by      evaluating    the      target_list   on      the 
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subset  of  the   product  space  for  which   the  Qualification  is 
true,  and  eliminating  duplicate  tuples. 

The  QUEL  examples  in  this  chapter  all  concern  the 
following  relations. 

CIFICER (MID, BANK, CITY) 

COaESE  (CCODE,TITIE,CDESCRIPT, DOR, LOCATION) 

COOESE_ATTENDED  (MID,CCODE , GRADE) 

The  following  are  valid  QUEL  interactions. 
Examfle  1.   Compute  duration  multiplied  by  7  for  course 
flepsys. 

RANGE  OF  C  IS  CCDESES 
BITRIEVE  INTO  H 
(DDR_IN_DAYS  =  C.DOR  *  7) 
WHERE  C. TITLE  =  "T^epsys" 

Here  C  is  a  tuple  variable  which  ranges  over  the  COURSES 
relation,  and  all  tuples  in  that  relation  are  found  which 
satisf-5  the  qualification  C,  TITLE  =  "Wepsys".  The  result  of 
the  query  is  a  new  relation,  W,  which  has  a  single  dcmain 
DUR_IN_EAYS  that  has  been  calculated  for  each  qualifying 
tuple.  If  the  resulting  relation  is  omitted,  gualifyirg 
tuples  are  written  in  display  format  on  the  user's  terminal 
or  returned  to  a  calling  program. 

Example  2.  Insert  the  tuple  (ID4,Capt, John, Salinas)  into 
OFFICER  relation. 

APPEND  TO  OFFICER (MID  =  "ID4", RANK  =  "Capt" , 
NAME  =  "John", CITY  =  "Salinas") 

Here  the  resulting  relation  OFFICER  is  modified  by 
adding  the  indicated  tuple  to  the  relation.  Domains  which 
are  not  specified  default  to  zero  for  numeric  domains  and 
null  for  character  strings. 
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Example  3.  Cancel  all  the  courses  which  are  given  in 
Monterey. 

RANGE    OF   C    IS    CCDESES 

LELETE   C   WHERE    C.LOCATICN    =    "Monterey" 

Here  C  specifies  that  the  CODRSES  relation  is  to  be 
modified.  All  tuples  are  to  te  removed  for  which  LOCATION 
has    the   value  "Monterey". 

Example  4.  Promote  all  captains  to  major  if  the  officer 
got    the   grade    »A*    free  any   course. 

RANGE    OF    0    IS    OFFICER 
RANGE    OF    CA    IS    CCDRSE_ATTENDED 
REPIACE    0(RANK    =    "Maj") 
WHERE    O.RANK    =    "Capt"    AND 

O.MID    =    CA.MID    AND    CA. GRADE    =    "A" 

Here  O.RANK  is  to  be  replaced  by  "Maj"  for  those  tuples 
in   OFFICER    relation    where  the   qualification   is   true. 

C.       INGRES    DTIIITY    CCHMANDS 

lE   addition    to   the   above    QUEL   commands,      INGRES    supports 
a  variety  of   utility   commands.    These    utility  commands   can    be 
classified   into   seven  major    categories. 
1.      Invocation    of    INGRES: 

INGRES   databas€_name 

This  command  invokes  INGRES.  "Database_name"  which  is 
tte  name  of  an  existing  database.  (A  database  is 
simply  a  named  collection  of  relations  with  a  given 
database  administrator.)  This  command  executed  from 
UNIX  "logs  in"  a  user,  then  the  user  may  issue  all 
other  commands  (except  those  executed  directly  from 
UNIX)    within   the   environment    of   the   invoked    database. 
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2.  Creation    and    destruction   of   databases: 

CEEATEDB    datalase__name         ^ 

DESIEOYDB  datatase_name 

These  two  comniands  are  called  from  ONIX.  The  CREATZD3 
ccmmand  creates  a  new  INGRES  database.  The  persoc  who 
executes        this        command  becomes        the        Database 

Administrator       (DBA)  for   the      database',         DESTEOYDB 

command  removes  all  references  to  an  existing  data- 
tase.  The  directory  of  the  database  and  all  files  in 
that  directory  are  removed.  To  execute  DESTROYDE  that 
person   must   be   the  DBA   for   "datafcase_name". 

3.  Creation    and    destruction   of  relations: 

CREATE  tablename  (columnname  =  format, columnname  = 

format, ...  ) 

DESTROY  tablename 

The  CREATE  command  enters  a  new  table  into  the  data- 
base. The  table  is  "owned"  by  the  user  who  invokes 
the  command.  DESTROY  removes  the  table  from  the  data- 
base. Only  the  table  owner  may  destroy  a  taLle.  The 
columns  are  created  with  the  type  specified  by 
"format".  The  current  formats  accepted  by  INGBES  are 
1-,  2-,  and  U-byte  integers,  4-  and  8-byte  floating 
point  numbers,  and  1-  to  255-byte  fixed  length  ASCII 
character  strings. 

4.  BulX  copy  of  data: 

COPY  tablename  (columnname  =  format, columnname  = 

format,...  )  into|from  "filename" 

PRINT  tablenaire 
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The  COPY  command  moves  data  between  INGRES  tatles  and 
standard  files.  "Ta.blename"  is  the  name  of  an 
existing  table.  In  general,  "columnname"  identifies  a 
column  in  the  table.  "Format"  indicates  the  storage 
format  for  the  column's  values  in  the  file.  To  write 
a  file,  use  the  "into  filename"  form  of  the  CCPY 
command.  To  copy  data  from  a  file  to  an  INGRES  table, 
use  the  "from  filename"  form  of  the  command.  The 
PRINT  command  displays  the  contents  of  a  table  speci- 
fied at  a  user's  terminal  under  predefined  formats. 
,5.   Storage  structure  modification: 

MODIPy  tablenaae  TO  storage_structure 

0N(key1,key2, ...  ) 

INDEX  ON  tablename  IS  indexname(key 1, key2, , . .  ) 

The  MODIFY  coiniand  changes  the  storage  structure  of  a 
relation  from  one  access  method  to  another.  Only  the 
owner  of  a  table  can  modify  that  table.  This  command 
is  used  to  accelerate  performance  of  queries  that 
access  the  table,  particularly  when  the  table  is 
large  or  frequently  referenced.  The  storage  struc- 
tures currently  supported  will  be  discussed  in 
Section  D  of  this  chapter.  The  indicated  keys  are 
domains  in  tablename  which  are  used  concatenated  left 
to  right  to  form  a  combined  key  which  is  used  in  the 
organization  of  tuples  in  all  but  one  of  the  access 
methods.  The  INDEX  command  creates  a  secondary  index 
on  existing  tables  in  order  to  make  retrieval  and 
updating  with  secondary  keys  more  efficient.  The 
secondary  key  is  constructed  of  columns  from  the 
primary  table  in  the  order  given.  A  maximum  of  six 
"columnname"s  may  be  specified  per  index,  but  a  user 
can   build  any  number   of   secondary  indexes   for   a 
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primary  table.  Only  the  owner  of  a  table  is  allowed 
to  create  secondary  indexes  on  that  table.  In  order 
to  maintain  the  integrity  of  the  index,  users  are  not 
permitted  to  update  secondary  indexes  directly. 
However,  wherever  a  primary  table  is  changed,  its 
secondary  indexes  are  automatically  updated  by  the 
system. 
6.   Consistency  and  integrity  control: 

DEFINE  INTEGEIIY  ON .range_var  IS  gual 

DESTROY  INTEGRITY  tablename (integer ,..., integer  I  all) 

HELP  INTEGRITI  tablename 

RESTORE  database^name 

The  DEFINE  INTEGRITY  command  adds  an  integrity 
constraint  for  the  table  referred  to  by  "rang6_var". 
After  the  constraint  is  defined,  all  updates  to  the 
table  must  satisfy  "gual".  "Qual"  must  be  true  for 
every  existing  row  in  the  table  when  the  INTEGRITY 
statement  is  issued.  Updates  that  violate  any  integ- 
rity constraints  are  simply  not  performed. 

HELP  INTEGRITY  command  prints  current  integrity 
constraints  on  a  specified  table.  DESTROY  INTEGRITY 
removes  integrity  constraints  from  a  table.  To 
destroy  constraints  for  a  table,  the  integer  argu- 
ments should  te  those  printed  by  a  HELP  INTEGRITY 
command  on  the  same  table.  Only  the  table  owner  may 
destroy  integrity  constraints. 

The  RESTORE  ccirmand  checks  and  cleans  up  a  database 
after  an  INGRES  or  operating  system  crash-  RESTORE 
should  be  executed  after  any  abnormal  termination  to 
assure  database  integrity.  The  RESTORE  command  is 
only  available  to  the  database  administrator. 
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7.   Miscellaneous: 

HELP 

SAVE  tablename  UNTIL  expiration_date 

FUEGE  databas6_nanie 

HELP  may  be  used  to  print  information  about  INGRES 
features,  definitions  of  views,  protections  or 
permissions,  or  information  about  the  contents  of  the 
database  and  specific  tables  in  the  database.  SAVE 
is  the  mechanism  by  which  a  user  can  declare  his 
intention  to  keep  a  table  until  a  specified  time. 
PURGE  is  a  UNIX  command  which  can  be  invoked  by  a 
database  administrator  to  delete  all  relations  whose 
"expiration_dates"  have  passed.  This  should  be  done 
when  space  in  a  database  is  exhausted.  (The  database 
administrator  can  also  remove  any  relations  from  his 
database  using  the  DESTROY  command,  regardless  of  who 
their  owners  are.) 

D.   S1CEAGE  STRUCTDEES 

Often  the  relation  (table)  will  be  stored  in  such  a  way 
that  a  complete  scan  is  not  required.  Also  secondary  indices 
can  be  declared  and  are  used  if  possible  to  limit  the  number 
of  tuples  examined. 

There  are  five  nodes  of  relation  storage  structure.  A 
relation  owner  can  decide  both  storage  structure  and  what 
secondary  indices  (if  any)  to  construct,  then  both  decisions 
will  te  done  automatically  by  the  system.  The  five  main 
storage  structures  are: 

1.  ISAM   :   indexed  sequential   access  method  structure, 
duplicate  rows  removed 

2.  CISAM   :  compressed  isam,  duplicate  rows  removed 
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3-   HASHED  :   random  hash   storage  structure,    duplicate 
rows  removed 

4.  CRASH   :  compressed  hash,  duplicate  rows  removed 

5.  HIAE    :  unkeyed  and  unstructured 

For  the  first  four  structures  the  key  may  be  any  ordered 
collection  of  domains.  These  schemes  allow  rapid  access  to 
specific  portions  cf  a  relation  when  key  values  are 
supplied.  The  remaining  non^keyed  scheme  (a  "HEAP")  stores 
tuples  in  the  file  independently  of  their  values  and 
provides  a  low  overhead  storage  structure,  especially 
attractive  in  situations  requiring  a  complete  scan  of  the 
relation. 
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IX.  IIPIEHEHTATIOH  CJ  PEHSQWHIL  PillBiil  DSING  ORACLE  LBMS 

The  CEACLE  relational  DBMS  has  been  used  to  inplement 
the  Personnel  Datatase  because  of  its  simplicity  and 
clarity.  SQL  is  the  lan^juage  that  is  used  to  access  and 
control  data  in  an  ORACLE  datatase.  As  a  result  of  this,  SQL 
is  used  as  DSL  in  the  database  operations  such  as  table  and 
view  creation,  updating  data,  and  in  ^ueries.  There  are  nine 
relations  in  the  Personnel  Database.  A  sample  of  the 
designed  database  with  example  data  is  shown  in  Appendix  B. 
A  relation  can  be  created  using  CREATE  command.  An  example 
of  CREATE  command  to  create  OFFICER'S  relation  can  te  as 
follows: 


UFI>  CREATE  T48LE  OFFICES 

2  (  '410  NU'^aERfS)  ^OT  NULL, 

3  RAN<  CHAR(a), 

4  ONA^IE  CHAR(<?), 

5  SEX  CHARd), 

6  PRIt-SRAN  CHA«(6), 

7  SECfSRAN  CHAR(6), 

8  OTHERS  CHAR(d)  ); 

Taol e    creat  ed. 


After  the  relation  is  created,   tuples  of  OFFICER  can  be 
inserted  using  the  INSERT  command. 


JFI>  INSERT  INTO  OFFICER 

2   VALUES(27363,*caot  ',  'Joh-isDn*,  <m'  ,  'artil  '  ',  'oi  )or'); 

1  record  created. 

UFI>  INSERT  I^JTO  OFFICER 

2   VALUES (1?23q, 'Daj  ',  'Hernandez',  '^',  'inf try',  -soefc*); 

1  record  created. 
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Users  of  the  Perscnnel  Database  wish  to  get  seme  infor- 
mation by  asking  the  following  sample  (queries. 

1.  List  all  tuples  in  the  relation  OFFICER  using  SELECT 
command. 


UFI>  SELECT  • 

2  FROM  officer; 


MIO  RANK  DNAME      SEX  °RI»-3R4N  3EC»-8RAN  OTHERS 


27363 

caot 

Johnson 

■n 

art  i  11 

oi  1  ot 

12239 

wa  I 

^e^nande^ 

i\ 

inf t  rv 

soef  c 

52^458 

lit 

Roobi  ns 

f 

ai  rdef 

arlD 

aJ596 

21t 

Smi  th 

m 

■nedl  c 

oi  1  ot 

109'?  <? 

Icol 

9rown 

m 

inf try 

ado 

35768 

1  1  t 

Greenoerq 

•n 

si  5C3r 

pilot 

2936'J 

caot 

Ja-nea 

■n 

■n  i  1  enq 

soefc 

167aS 

mai 

Lei  -Jht  on 

■n 

f  i  "lanc 

ado 

10792 

col 

Stone 

m 

ordnan 

art  1  n 

9  records  selected. 


2.   List  all  officers  who  were  assigned  between  the  date 
1-JAN-1966  and  1-JAN-1980. 


UF1>  SELECT  MID, RANK, ONAME 

2  FROM  OFFICER 

3  (KHERE  *^ID  IN 

«       (  SELECT  A*1ID 

5  FROM  ASSIGNMENT 

6  «HERE  ASGOATE  BETwEE^  'l-JaN-66'  AND  'l-JAN-eO'); 


MID  RANK  ONAME 


10792  col  Stone 

10999  Ico)  3ro-n 

167a5  maj  Leiqhton 

27363  caot  Johnson 

2936«  caot  James 

12239  maj  Hernandez 

b    records  se 1 ect  ed. 
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3.    List   military_IDs,    ranks,    names,    and   primary    tranches 
of    all   officers    who    hav^    taken   an    »adp*    course. 


UF.3>  SELECT  MID,  RANK  ,  ONAME,  PRI^9RAN 

2  FROM  OFFICER 

3  (<HERE  '^ID  IN 

a       (  SELECT  MEIO 

5  FROM  M«.EDUCATION 

6  MHZ»E    CCODEA  IN 

7  (  SELECT  CCODEB 

8  FROM  M^-COURSES 

9  WHERE  CTITLE  =  'ado' 


)  ); 


MID  RANK  ONAME 


PRI<-BRAN 


32^458  lit   bobbins 
10999  Icol  Bpown 


ai  rde t 
i  nf  t  ry 


4.  List  all  unit  categories  for  units. 


UFI>  SELECT  UNIQUE  JCAT 
2   FROM  UNIT; 


UCAT 

div 
brg 

deo 
h  0  s  D 
reg 
bt 
bn 

7  records  sel ected, 
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5.  List  military^IDs,  ranks,  names,  sex,  and  primary 
tranches  for  all  otticers  who  speak  German  and  took  the 
course    ♦SS002*. 


UFJ>  SELECT  MID, RANK, ONAME, SEX, PRI«-BR4N 

2  FROM  OFFICER 

3  WHERE  "^ID  IN 

«       (  SELECT  MEIO 

5  FROM  M4.EDUCATI0N 

6  *(HERE  CCODEA  =  '33003'  ) 

7  AND  MIO  IN 

8  (  SELECT  FID 

9  FROM  LANGUAGE 

10         WHERE  NLANGUAGE  s  'german'  ); 


MID  RANK  3NAME       SEX  PRIt-BRAN 
357b8  lit   Greenbepg  *    sigcor 


6.  Order  all  military_education  tuples  by  course_code, 
and  within  course_ccde  put  them  in  descending  course_grade 
order. 


UFI>  SELECT  • 

2  FROM  M«.eouCATION 

3  ORDE^  BY  CCODEA, CGRADE; 


CCODEA     M£n  CGRADE  CDATE 


AA102 

a5596 

A- 

12-JUL-9a 

A0002 

32^53 

A 

23-NOV-92 

AS003 

27363 

at 

13-AP9-77 

CS502 

32y59 

B* 

26-OCT-ea 

CS509 

10999 

A- 

31-JA^^-77 

HS70fe 

U359b 

A 

22-JUN-92 

IA076 

16745 

A. 

22-NOV-76 

IS005 

10999 

A 

1  l-OCT-70 

IS005 

12239 

A- 

30-SEO-72 

OC092 

10792 

A- 

01-OEC-69 

SS002 

35769 

Bt 

26-FEB-a2 

II  records  selected. 
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A  view  may  derive  data  from  more  than  one  relation.  This 
is  done  ty  defining  a  view  using  a  join  query.  An  example  of 
a  view  (view  OFFICEEEEUC)  is  shown  below. 

7.  Create  an  OFFICEREDOC  view  from  a  join  of  the  OFFICEE 
relation  to  the  A  EDDCATION  relation. 


JF1>  CREATE  VIEW  OFFICEREDUC ( V  ID, RANK , VNAME , VBHAN, VOEG, SEX  )  AS 

2  SELECT  MID, RANK, ONAME, ABRAN,ADEG, SEX 

3  FROM  OFFICER,  At-EDUCATION 

U  /^HERE  OFFICER. '^ID  =  A«.ED  JC  A  T  ION.  A  I  D; 

■'  i  ew  c  re«t  ed , 


8.    Count  the   numter   of  officers    who   are    captain    and    have 
•BS*    academic   degree. 


UFI>  SEUECT  COUNKRANK) 
2   FROM  OFFICEREDUC 
5   *HERE  VOEG  a  'aS' 
4   ANO    RAv<K  s  'caot'; 


COUNT(RANK) 
2 
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9.  List  all  assignments,  ordered  by  assignment  date,  for 
officer  identified  by    » 10792'  as  MID. 


Uri>  SELECT  MI0,0NA><e,UNAMe,A3G0ATE 

2  FROM  OFFICER, ASSIGNMENT, UNIT 

3  «MERE  HIO  z    10792  AND  MID  =  AMID 
a   AND  DCOOE  =  A«.UCODE 

5   ORDER  ay  ASGOATE; 


MID  0NA1E  UNAME  ASGOATE 

10792  Stone  9th  Art  Brg  01-3EP-66 

10792  Stone  2nd  Inf  D«v  ll-JAN-71 

10792  Stone  64th  Orel  Oeoot  15-APR-78 


10.   List  all  officers  and  courses  for  officers  in  terms 
of  the  courses  with  duration  of  at  least  one  year. 


"!>  SELECT  R4NK,0NAME,CDES:,CLDCB,C0UR 

?  FROM  OFFICER, M..CDURSES,M«-E0UC4TI0N 

3  /(MERE  "^ID  =  i^'EIO 

a  AND  CCODEA  =  CCOOEB 

5  AND  CDJR  >=  52; 


4NK  ONAME      CDESC  CLOCB  CDUK 

It   Smith      Aca.  of  Health  Sci.  F t . Houst on, Tx        9b 

It   Smith      Army  Aviation  School         Ft.Ruckep,AL         52 
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11.  Compute  and  display  the  sum  of  duration  of  course  (s) 
which  was  taken  by  officer  •Smith'. 


UFI>    SELECT    SUM(CDUf*) 

2  FROM    OFFICER, M«-C0URS£S,M^£!5UCATI0N 

3  r^HERE    >^ID    =    ^EIO 

a   4N0  ONAMg  s  'Sflith' 
5   ANO  CCODEA  =  CCODEB; 


SUM(COUR) 
1«8 


12.  List  all  officers  and  assignment  re^^uests,  including 
rank,  officer  name,  militdry_ID,  and  request  number,  for 
assignment  re<juests  where  officers*  specialties  meet  the 
reguirements  specified  in  that  request. 


UFI>  SELECT  REQNUM, MID, RANK, 0"JAME 

2  FROM  OFFICER,  ASGREO,  A«.EDJCATION,M«.EDJCATION,  MEDICAL 

3  ^rtERE  MID  =  MEID 
U  AND  MID  =  AID 

5  AND  MID  s  HID 

6  AND  PRI»-3RAN  =  Rt-PRIBR 

7  AND  SEC«-3RAN  =  R4-SECRR 

8  AND     A9RAN  =  R»-ACABR 

9  AND  CCODEA    =  R^MILED 
10  AND  HSTAT  <=  R«-HSTAT; 


REONUM         MID  RANK  ONAME 


527685-1    1223R  maj   Mernandei 
03l08«-3    167a5  maj   Leighton 
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I.  FUHCTIONS  CF  A  DATABASE  MANAGEMENT  SYSTEM 


A.   INTBCDTJCTIOH 


Tlie  principal  function  of  the  DBMS  is  to  store, 
retrieve,  and  modify  data.  However  in  an  operational  envi- 
ronment the  DBMS  should  provide  other  important  functions. 
In  this  chapter,  we  will  first  describe  major  DBMS  functions 
and  then  discuss  three  of  these  functions  in  detail  : 
recovery,  concurrency,  and  security. 

Major  DBMS  functions  are  described  by  Codd,  E.F.  in 
[Ref.  20:p-  114],  and  they  are  shown  in  Figure  10.1. 


— 

1. 

Store,  retrieve,  and  update  data. 

2. 

Provide  recovery  services  in  case  of  failure. 

3. 

Provide  concurrency  control  services. 

n. 

Provide  security  facilities. 

5^ 

Provide  integrity  services  to  enforce  database 
constraints. 

6. 

Provide  a  user-accessitle  catalog  of  data 
descriptions. 

7. 

Support  logical  transactions. 

8. 

Interface  with  communications  control  programs. 

c 

Provide  utility  services. 

Figure  10.  1   Major  Functions  of  a  DBMS. 


The  first  and  fifth  functions  have  been  discussed  in 
previous  chapters.  Recovery,  concurrency  control,  and 
security  facilities  will  be  discussed  later  in  this  chapter. 
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The  management  of  large,  complex  databases  is  difficult. 
Maintaining  a  database  with  ten  record  types  and  hundreds  of 
data-items  can  be  time-consuming.  If  a  database  is  processed 
by  hundreds  of  application  programs  then  changes  of  records 
and  data-items  can  be  risky.  Questions  like  "Which  programs 
are  affected  by  a  change  to  eight-digit 
Unit_location_Codes?"  or  "Which  records  contain  MID?"  are 
frequent.  Since  most  databases  are  self-describing,  much  of 
the  data  needed  to  answer  these  questions  are  stored  within 
the  database.  However,  these  information  may  not  be  readily 
accessed  by  humans.  For  that  reason,  "a  user-accessible 
catalog"  which  contains  data  descriptions  and  data  about  the 
relationship  between  programs  and  data  will  be  very  useful 
to  the  user,  and  it  should  be  provided  by  DBMSs. 

A  logical  transaction  is  a  sequence  of  activities 
performed  atomically.  Usually,  transactions  include  several 
actions  on  the  datatase.  Unfortunately,  the  DBMS  product 
cannot  know  which  grcups  of  actions  are  logically  related. 
Thus  the  DBMS  must  provide  facilities  for  the  application 
programmer  to  define  transaction  boundaries  which  are  needed 
in  handling  concurrent  control  and  recovery  functions. 

In  addition  to  these  functions,  the  DBMS  must  interface 
with  a  communications  control  subsystem  which  controls  the 
flow  of  transactions  to  application  programs  from  the  DBMS. 
Finally,  the  DBMS  must  provide  utility  programs  to  facili- 
tate database  maintenance-  These  utility  prograns  may  be 
used  to  unload,  reload,  and  execute  the  database;  or  they 
may  be  used  to  make  mass  insertions  or  deletions  of  data  in 
or  out  of  the  database. 

"No  current  DBMS  provides  all  of  these  functions  in  a 
satisfactory  way.  These  capabilities  can  be  used  as  a 
checklist  of  decisicr  criteria  for  a  DBMS.  A  system  that 
does  not  provide  most  of  them  is  not  truly  a  DBMS, 
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B.   EEC07EBI 

Computer  and  datai)ase  systems  can  fail  in  many  ways. 
Computers  stop  unexpectedly,  disk  heads  crash,  operators  may 
drop  disks,  programs  may  have  bugs,  and  so  on.  Database 
systems  must  include  not  only  a  variety  of  checks  and 
controls  to  reduce  the  likelihood  of  failure,  .  but  also  an 
extensive  set  of  procedures  for  recovering  from  the  failures 
that  will  inevitably  occur  despite  those  checks  and 
controls. 

In  an  operational  environment  there  are  many  possible 
causes  of  failures,  such  as: 

-  programming  errors;  in  an   application   or  in  the  data- 
base system, 

-  hardware  errors;  on  the  device  or  the   channel   or   the 
CPU, 

-  operator  errors;  such  as  mounting  a  wrong  tape, 

-  fluctations  in  the  power  supply, 

-  fire  in  the  computer  system  room. 

If  such  errors  occur  during  a  database  interaction,   the 

database  can  be  left  in  an   inconsistent  state.    Recovery 

software  is   used  to  restore   the  database  to  some  previous 
consistect  state. 

^  •      ^§covery  via  Eeprocessing 

There  are  a  variety  of  recovery  algorithms.  Ihe 
simplest  way  is  to  keep  back-up  a  copy  of  a  database.  This 
copy  is  created  periodically,  once  or  twice  a  day.  Then, 
when  a  failure  occurs,  the  last  back-up  copy  is  used  to 
restore  the  database.  Any  transactions  since  that  copy  was 
made  are  run  again.  This  algorithm  is  called  "recovery  via 
reprocessing"  and  it  has  several  drawbacks.  First,  repro- 
cessing transactions  takes  the  same  amount  of  time  as 
processing  them  the  first  time.  This  means  that  one  day  will 
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he  required  to  recover  one  day  of  processing.  If  the  system 
is  heavily  loaded,  the  system  may  never  catch  up.  Second, 
when  transactions  are  processed  concurrently,  it  is  impos- 
sible to  guarantee  that  they  can  be  reprocessed  in  the  same 
order  as  they  were  originally  processed.  For  these  reasons, 
reprocessing  is  not  a  viable  fcrm  of  recovery. 

2  -   Transactions 

Ihe  fundamental  purpose  of  the  database  system  is  to 
process  transactions.  A  transaction  is  a  program,  or  a 
program  part,  that  can  read  frcm  or  write  into  the  database. 
It  consists  of  the  execution  of  an  application-specific 
sequence  of  operations.  These  operations  can  be  of  five 
types:  EEGIN  TRANSACTION,  READ,  WRITE,  COMMII,  and 
ROLLBACK.  All  transactions  tegin  with  BEGIN  TRANSACTION 
operation.  READ  causes  a  page  or  record  to  be  read  from  the 
database.  WRITE  causes  a  new  copy  of  a  page  or  record  to  be 
writteE  into  the  database.  COMMIT  tells  the  system  that  the 
transaction  has  terminated  succesfully  and  that  all  of  its 
updated  pages  or  records  should  be  permanently  reflected  in 
the  database.  ROLLBACK  tells  the  system  that  the  transaction 
has  terminated  abnormally  and  that  the  records  or  pages  it 
wrote  into  should  be  returned  to  their  previous  state.  A 
transaction  can  have  only  one  COflMIT  or  ROLLBACK  processed, 
and  transactions  cannct  be  nested. 

The  recovery  manager  processes  the  READ,  WRITE, 
COMMIT,  and  ROLLBACK  commands.  It  also  handles  system  fail- 
ures so  it  provides  reliability  for  the  DBMS. 

3 •   Recovery  via  Rollback/Rollf orward 

Ihis  approach  uses  the  following  four  step 
algorithm: 
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1.  E€create  the  outputs  of  all  successfully  completed 
transactions,  (Transactions  which  are  ended  with 
CCHMIT  operation.) 

2.  Abort  all  transactions  in  process  at  the  time  of 
failure. 

3.  Eemove  database  changes  generated  by  aborted  trans- 
actions. 

4.  Eestart  aborted  transactions. 

This  algorithJi  will  appropriately  recover  the  data- 
base. Some  transaction  outputs  cannot  be  undone.  This 
outputs  are  called  "Real  outputs"  by  Gray  in  [Ref.  21:pp. 
223-2^2].  Real  outputs  are  messages  which  are  received  by 
people  who  are  using  the  system,  like  order  conf irmaticrs  or 
inputs  to  other  transactions.  The  message 
"OFFICER,  (MID: 9999999) ,  IS  ASSIGNED  TO  THE  UNIT, 
(UNIT_ID:9999) ,  ASSIGNMENT  ORDER  NO  IS  99999-9"  are  examples 
of  real  outputs.  Because  they  cannot  be  undone,  real  outputs 
should  net  be  produced  until  the  transaction  is  completed. 
It  is  recommended  that  a  log  of  real  outputs  be  maintained. 
When  the  transaction  is  completed,  the  actions  onthe  leg  are 
updated  and  the  real  outputs  become  visible.  If  a  failure 
occurs  when  the  real  outputs  are  being  produced,  each  output 
could  be  numbered  and  a  log  kept  of  the  real  outputs  that 
have  teen  produced- 

U.   Transaction  legging 

To  apply  UNDO  (rolling  back  a  transaction)  and  REDO 
(rcllfcrward  a  transaction)  processes  to  a  database  system, 
a  log  should  be  kept  of  transaction  results.  The  log 
includes  the  old  and  new  values  of  all  items  updated  by  the 
transaction,  and  it  is  in  chronological  order.  The  log 
resides  on  either  disk  or  tape.  When  a  failure  occurs,  the 
log  is  used  to  both  UNDO  and  REDO  transactions,  as  shewn  in 
Figure  10.2  and  Figure  10.3,  respectively. 
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EATABASE 
WITH  CHANGES 


i 


I  BEFORE  IMAGESI 


f   UNDO  j : 

t 


DATABASE 
WITHOUT  CHANGES 


Figure  10.2    ONDO  Transaction  Procedure. 


1  DATABASE 

I  WITHOUT  CHANGES 


f     REDO  J >l 


I  DATABASE 


WITH  CHANGES 


1  AFTER  IMAGESI' 


t 


Figure  10.3    HEDO  Transaction  Procedure. 

To  UNDO  a  transaction,  the  log  must  contain  a  copy 
of  every  database  record  before  it  was  changed.  Such  records 
are  called  before  images.  By  applying  before  images  tc  the 
database  an  UNDO  procedure  is  performed.  To  REDO  a  trans- 
action the  log  must  ccntain  a  copy  of  every  database  record 
after  it  was  changed.  These  records  are  called  after  images. 
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By  applying  after  images  to  the  database  a  REDO  procedure  is 
performed.  Possible  data-items  of  a  transaction  log  are 
shown  in  Figure  10.4. 


Transaction  ID  Operation  Type 

Reverse  Pointer  Object 

Forward  Pointer  Old  Value 

Time  '     New  Value 


Figure  10.4   Data-items  of  a  Log  Record. 


For  identification  purposes  each  log  transaction  has 
a  unigue  ID.  All  images  are  linked  together  with  a  double- 
linked  list.  These  forward  and  backward  links  can  be  used  by 
the  recovery  manager  to  locate  all  records  for  a  particular 
transaction.  Other  data-items  are:  the  time  of  the  action, 
the  type  of  the  operation  (modify, insert,  etc.),  the  cfcjcct 
such  as  record  type  and  identifier,  and  the  old  and  new 
values. 

5 .      ^'rite-ahead   leg 

There  is  an  interval  between  writing  a  change  tc  the 
database's  stable  storage  and  writing  the  log  record  repre- 
senting that  change.  These  are  two  distinct  operations.  This 
fact  introduces  two  guestions:  What  happens  if  a  failure 
occurs  in  the  interval  between  these  two  operations?  I7hat 
should  be  done  to  avoid  improper  applications? 

Suppose  that  in  fact  such  a  failure  does  occur,  so 
that   only   one  of   the   writes   (the  first   operation)    is 
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performed  and  the  other  is  lost.  If  the  performed  operation 
is  the  database  write,  there  will  be  changes  in  the  database 
that  are  not  recorded  in  the  log  so  the  UNDO  process  is  not 
possible  for  these  changes.  It  is  obvious  that,  for  safety, 
the  leg  record  should  always  be  written  first.  Therefore  we 
can  define  the  write-ahead  lo£  protocol  as  follows: 

1-   A  transaction  is  not  allowed  to  write  a  record  to  the 

stable  storage  of  the   database  until   at  least   the 

before  image   of  the  log   record  has  been   written  to 

the  physical  leg. 

.  2.   A   transaction  is   not  allowed   to   complete   CCMMIT 

processing  until  both  the  before  images  and  the  after 

images  of   all  log  records   for  the   transaction  have 

teen  written  tc  the  physical  log. 

If  a  failure  occurs,  a  change  may  be  recorded  in  the 

log  and   not  in  the  database.    In  this  case,    the  recovery 

manager   may  attempt  to   undo   changes   that  have   not   yet 

cccured.  This  is  not  a  problem,  because  the  recovery  manager 

will  cnly  be  placing  before  images  in  the  database.   Records 

will  be  replaced  by  copies  of  themselves.  This  is  a  wasteful 

operation  tut  not  harirful. 

C.   CCHCDBBEICT  CONTECl 

Given  a  correct  state  of  the  database  as  input,  a 
correct  transaction  will  produce  a  correct  state  of  the 
database  as  output,  Zven  if  all  transactions  are  individu- 
ally correct,  however,  it  is  possible  in  a  multiuser  system 
for  transactions  that  execute  concurrently  to  interfere  with 
one  another  in  such  a  way  as  to  produce  an  overall  result 
that  is  not  correct.  As  an  example  of  that  kind  of  interfer- 
ence, we  will  consider  the  "lost  update"  problem. 


126 


1.   Concurrent  D^date  (Lost  Update)  Problem 

Ihe  lost  update  problem  can   be  represented  as  shown 
in  Figure  10-5. 


Time 

Transaction  ^ 

iJZil 

Transaction  B  (TB) 

t1 
* 

copy  tuple  i 
relation  R1 
* 

from 

* 
* 
* 

t2 

* 
* 

* 

copy  tuple  i  from 
relation  R1 
* 

t3 

♦ 

modify  tuple 
and  update 
* 

1/ 

* 
* 
* 

* 

* 

* 

t4 

* 

* 

* 

modify  tuple  1, 
and  update 
* 

Figure  10-5    Lost  Update  Problem- 


Transaction  A  is  intended  to  change  some  field  F  in 
tuple  i;  lets  say  will  double  the  value  of  field  F. 
Transaction  B  is  intended  also  to  double  the  value  of  that 
same  field.  Thus,  if  the  initial  value  of  that  field  is  2 , 
then  running  the  two  transactions  one  at  a  time,  without 
concurrency,  will  produce  a  final  result  of  8.  However,  the 
particular  concurrent  execution  sequence  shown  in  Figure 
10.5  produces  a  final  result  of  U.  That  particular  execution 
sequence  is  therefore  incorrect.  In  this  situation,  we  can 
say  that  TA's  update  is  lost  because  TB  overwrites  it. 

2 .   Eesource  Locking 

The  most  common  method  of  concurrency  control  is  to 
use  locks.   One  lock  is  maintained   for  each  user.   The  term 
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user  refers  to  the  user  of  DBMS,  not  necessarily  the  system 
user.  Thus  a  user  can  be  either  a  person  using  the  DBMS 
query/update  facility  via  a  terminal,  or  an  application 
program  that  calls  upon  the  EBMS  for  service.  A  program 
obtains  all  such  locks  before  making  any  updates. 
Concurrency  control  must  ensure  that  at  most  one  program 
gets  the  lock  for  one  database  part.  Hence,  if  a  program 
wishes  to  move  a  locked  item  to  the  program  working  storage, 
it  must  wait  until  the  previous  program  releases  the  lock. 
The  inplementation  of  this  process  differs  from  one  system 
to- another.  In  many  implementations,  user  programs  include 
commands  to  lock  the  required  records  before  updating  them. 
This  can  be  represented  pictorially  as  shown  in  Figure  10-6, 


Time    Transaction  A  (TA)      Transaction  E  (TE) 


* 

* 

* 

* 

* 

* 

t1 

lock  El.  Copy  ti 
i  froa  relation 

li^ 

* 

* 

* 

* 

* 

* 

t2 

* 

attempt  to  place  a 

« 

* 

lock  on  El 

* 

* 

wait 

t3 

modify  tuple  i. 

wait 

* 

and  update 

wait 

* 

wait 

t4 

release  El 

wai  t 

t5 

* 

lock  B1.  Copy  tuple 
i  from  relation  Rl. 

* 

* 

* 

* 

t6 

* 

modify  tuple  i,  and 

* 

* 

update 

* 

* 

t7 

* 

release  El 

Figure  10.6    Resource  Locking. 
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It  is  seen  that  transaction  B  is  now  made  to  wait  at 
time  t2,  because  its  request  for  a  lock  on  Rl  at  that  time 
conflicts  with  the  IccX  already  held  on  Rl  by  transaction  A. 
Transaction  3  resumes  after  transaction  A  releases  its 
lock.  This  kind  of  lock  mechanism  will  provide  a  correct 
final  result  for  those  two  transactions.  Lost  update  prob- 
lems can  be  solved  by  the  lock  mechanism. 

3 .   deadlock 

locks  can  introduce  the  problem  commonly  known  as 
deadlock.  A  deadlock  occurs  when  two  transactions,  say  TA 
and  TE,  each  places  locks  on  relations,  say  Rl  and  R2 
respectively,  and  then  each  transaction  attempts  to  place  a 
lock  on  the  others  already  locked  relation.  The  order  of 
processing  can  be  as  shown  in  Figure  10.7. 


Time 

Transaction  A  (TA) 

Transaction  B  (TB) 

« 

* 

* 

* 

* 

* 

t1 

lock  Rl 

* 

* 

* 

* 

♦ 

* 

* 

t2 

♦ 

lock  R2 
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* 

* 

* 
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* 

t3 

attempt  to  place 
a  lock  on  R2 

* 

* 

* 

wait 

* 

* 

wait 

* 

t4 

wait 

attempt  to  place 
a  lock  on  Rl 

wait 

* 

wait 

wait 

* 

wait 

wait 

Figure  10-7    Deadlock  Problem. 
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Both  transactions  are  then  waiting  for  each  other  to 
release  a  lock.  In  the  database  environment,  the  usual  step 
to  resolve  or  "to  break"  the  deadlock  is  to  rollback  one  of 
the  programs.  Breaking  a  deadlock  consist  of  chocsiLg  a 
"victim",  one  of  the  deadlocked  transactions;  and  rolling  it 
tack.  The  victim  is  not  necessarily  the  transaction  that 
actually  caused  the  deadlock;  it  may  be  the  one  holding  the 
fewest  locks,  or  the  one  that  was  most  recenly  started,  or 
the  one  that  has  made  the  fewest  updates.  The  rollback 
process  involves  the  following  jobs: 

-1.   Terminate  the  transaction,   victim,    and  undo  all  of 
its  updates. 
2,   Release  all  the  locks   of  the  transaction;   resources 
are  now  allocated  to  other  transactions. 

^  •   lock  Granularity 

So  far,  in  our  examples,  we  assumed  that  the  unit  of 
locking  is  the  individual  record.  However,  the  level  of  the 
lock  can  be  different  in  different  applications,  or  in 
different  DBJISs.  Locks,  at  the  highest  level,  can  be  applied 
to  an  entire  database.  This  strategy  is  used  by  DBMS  prod- 
ucts that  invoke  tie  lock  for  a  short  time  during  the 
processing  of  a  single  database  request.  Locks  can  also  be 
applied,  at  the  lowest  level,  to  a  specific  field  within  an 
individual  record.  In  between  these  extremes,  locks  can  be 
placed  on  records,  en  pages  or  blocks,  and  on  files.  As 
usual,  there  are  tradeoffs  among  these  alternatives.  A  lock 
of  the  entire  database  is  simple  for  the  DBMS  to  manage. 
However,  throughput  nay  be  slow  because  of  less  concurrency. 
On  the  other  hand,  locks  of  small  granularity  will  be 
complex  to  manage  but  throughput  will  tend  to  be  faster 
because  of  more  concurrency.  The  choice  among  alternatives 
depends  en  reguirements- 
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B.   lAIAEASE  SECORITI 

The  security  in  database  environment  is  protection  of 
the  datatase  against  unauthorized  disclosure,  alteration,  or 
destruction.  The  suiject  of  database  security  has  many 
different  aspects  and  approaches,  such  as  physical  protec- 
tion, hardware  controls,  using  passwords,  or  using  authori- 
zation tables.  Here  we  are  concerned  primarily  with 
restricting  certain  users  so  they  are  allowed  to  access 
and/or  modify  only  a  subset  of  the  database. 

Good  security  means  that  people  have  access  to  the  data 
that  they  need  to  accomplish  their  job  function,  and  no 
more.  Job  functions  vary,  and  for  that  reason  data  access 
authorizations  will  vary.  A  table  called  authorization  rules 
is  used  for  that  purpose,  and  was  developed  by  Fernandez, 
Summers,  and  Wood  in  [Bef.  22:p.  5]. 

Authorization  rules  are  compiled  and  stored  in  the 
system  dictionary.  lirst,  these  rules  will  be  entered  into 
the  system,  then  they  will  be  enforced.  The  authorization 
rules  compiler  and  the  corresponding  enforcement  mechanism 
together  make  up  the  security  subsystem. 

In  the  application  environment  it  is  convenient  to  use  a 
matrix  for  authorization  rules.  The  matrix  is  called  an 
authorization  matrix  in  which  rows  correspond  to  users  and 
columns  correspond  to  data  objects.  The  entry  A[i,j]  repre- 
sents the  set  of  authorization  rules  that  apply  to  user  i 
with  respect  to  data  object  j.  An  example  of  an  authoriza- 
tion matrix  is  shown  in  Figure  10.3. 

Sophistication  of  the  security  subsystem  can  be  measured 
by  the  granularity  of  the  objects.  For  example,  some  DBMS 
systems  support  authorization  only  at  the  level  of  whole 
relations,  others  permit  authorization  at  the  level  of  indi- 
vidual fields.   In  our  example  authorization  is  based  on  the 
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DATA  0EJECT1    DATA  0BJECT2   DATA  OBJECTS 
(  OFFICEE       (UNIT           (  COUBSES 
relation  )      relation  )     relation  ) 

USEE1 
(Erown) 

DSEE2 
(John) 

USEE3 
(Fersonel 
Office  ) 

DSEE4 
(Erog-3  ) 

DSEE5 
(Education 
Office  ) 

All 

All 

All 

1 

NONE 

NONE 

NONE 

All 

EEAD 

EEAD 

UPDATE 

NONE 

EEAD 

EEAD 

EEAD 
UPDATE 

EEAD 

All 

" 

Figure  10.8   An  Example  for  Authorization  Matrix. 

names  of  objects  and  not  on  their  value.  This  is  called 
value -in dependent  coEtrol.  In  this  schema,  the  system  can 
enforce  the  controls  without  having  to  access  the  data 
objects  themselves.  It  is  also  possible  to  provide  value- 
JS£endent  control  in  that  we  can  extend  the  entries  in  the 
matrix  to  include  an  optional  access  predicate.  For 
example,  the  entry 

SEIECT  * 

FECM  OFFICEE 

^HEEE  EANK  =  *CAPT» 

might  be  used  to  allow  SEIECT  access  to  some  officers  and 
not    others. 
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Authorization  rules  can  also  specify  that  certain  field 
combinations  are  prohibited,  even  though  the  individual 
fields  within  the  comhination  may  be  accessible-  It  is  also 
necessary  to  control  access  to  programs.  Moreover,  it  is 
important  to  control  access  to  the  authorization  matrix 
itself. 
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XI.  COMCIPSIONS 

Information  is  a  hasic  resource,  like  people  or  money, 
for  an  enterprise,  and  it  should  have  a  professional  manage- 
ment group  that  is  responsible  for  its  effective  use 
throughout  the  enterprise.  For  achieving  this  task,  a  new 
staff  function  called  information  resource  management  (JEM) 
has  been  proposed.  Ihis  function,  in  most  cases,  should 
establish  policies  and  procedures  to  guide  users,  system 
developers,  and  managers  so  that  their  decisions  will  be 
consistent  and  compatible  and  employ  the  best  in  currently 
available  technology.  In  a  DBMS,  this  function  is  referred 
to  as  Database  Administration  (DBA).   [Bef.  15:pp.  168-183] 

Cn  the  other  hand,  the  personnel  administration  function 
for  managers  of  an  organization  must  have  complete  ccntiol 
over  evaluating,  assigning,  and  firing  their  own  employees. 
In  order  to  perform  this  task  satisfactorily  and  eifec- 
tively,  the  managers  have  to  make  their  own  decisions  very 
accurately.  Sometimes,  they  are  forced  to  make  such  deci- 
sions in  a  short  period  of  time.  Those  factors  in  a  powerful 
personrel  management  can  be  provided  by  having  a  well- 
designed  personnel  database  and  a  suitable  DBflS. 

James  F,  Fry  and  Edgar  H.  Sibley  state  in  their  1976 
paper  £Ref.  23]  that  the  objectives  of  database  management 
are: 

-to  make  an  integrated  collection  of   data  available  to  a 

wide  variety  of  users  (data  availability), 
-to  provide   for   quality  and  integrity  of  the  data  (data 

guality)  , 
-to  insure   retention   of   privacy  through  security  meas- 
ures within  the  system  (privacy  and  security) , 
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-to   allow   centralized   coDtrol   of  the   database   which 
is   necessary    for   efficient    data   administration 
(management  contrcl) ,  and 
-to  provide  a  high  degree  of  data  independence. 
Considering  those  major  objectives  and   some  advantages 
such  as  simplicity,    ease  of  use,   data   independence,   and 
theoretical  foundation,    the  relational  database   model  has 
been  found   to  be  convenient   in  designing  such   a  personnel 
database  system.   The  other  database  models  are  more  complex 
and  more  difficult  tc  implement. 

.  After  the  organization's  requirements  are  understood, 
the  process  usually  begins  by  choosing  the  data  model  that 
seems  most  appropriate  and  then  proceeding  to  a  detailed 
evaluation  of  only  tie  available  DBMS  products  that  support 
the  selected  model.  This  is  the  problem  of  choosing  a  DBMS. 
Several  committees  are  working  on  this  problem  such  that  all 
DBMS's  provide  the  same  functions  and  the  same  interfaces 
will  be  standard.  In  this  thesis,  the  ORACLE  DBMS  is  used  to 
show  the  inplementaticn  stage  of  the  personnel  database. 

In  conclusion,  it  is  useful  to  emphasize  that  it  is  not 
only  important  to  design  an  efficient  database  for  an  enter- 
prise but  also  it  is  required  to  maintain  and  develop  the 
database  by  permanently  monitoring  its  performance  to 
maximize  efficiency  as  a  final  operational  responsibility  of 
the  database  administration  (DEA)  . 
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APPENDIX  A 
SEHANTIC  DATABASE  DESIGN 

The  detailed  description  of   the  Semantic  Datatase  Model 
(SDM)   design  for  the  Personnel   Database  which  is  mentioned 
in  Chapter  V   is  showc  below. 
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OIFICEE 

description:      All   officers   who   are   on   active-duty. 

member  attributes: 

Military^ID 

description:  A  unique  number  for  each  officer 

value  class:  MID 

mandatory 

not  changeable 

Rank 

value  class:  BANK 

Date  of_promotion 
value  class:  DATE 

Name 

value  class:    PERSON_NAMES 

Birth_date 

value  class:  DATE 

Beginning   date_to   active-duty 

descrip'Eion:    Da^e  of   first   day    of   being    or 

active-duty, 
value   class:    DATE 

Native_country 

value   class:    CODNTEY 

Sex 

value   class:    SEX 

Marital   status 

value  class:  MARITAL_STATUS 

Number_of  children 

value  class:  INTEGERS 

Permanent  address 

value  class:  ADDRESS 

Current_addr ess 

value  class:  ADDRESS 

Primary_branch 
description: 
value  class:  BRANCHES 

Secondary  branch 
descrip"Eicn: 
value  class:  BRANCHES 

Academic  education 

value  class:  ACADEMIC_MA JOR 

match  :  Academic  tr anch/Academic  degree  of 

ACADEMIC^MAJOR  on  AID.   " 
multivalued 


Figure  A.I    SEB  Design  for  Personnel  Database. 
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Military    education/courses 

value    class:    HILITARI_EDDCATION/COnRSES 
match    :    Course/Military   school    code  of 

MILITARY^EDDCATrON/COaHSES    on    MEID. 
multivalued 

Health   condition 

value   class:    i3EDICAL   INFO 

match    :    Gereral   health   status   of 

MiriCAL~INFC    on    HID. 

Foreign   language  capaiility 

value~class:    FnREIGN_IANGnAGE 
match    :    Foreign  .language   of 

FOREIGN~LANGDAGE    on    FID. 
multivalued 

Dnit_assign€d 

descripticr:    Units    which    the   officer    has    keen 

assigned    until  current  date, 
value   class:    ONIT 
inverse  :    Of f icer_assigned 

multivalued 

identifiers: 

Military^ID 


ACAEEMIC_MAJOR 

description:    Type   of    academic   branch, the    degree 
earned   for    that    branch,    location 
and  name   of    the    universty   which 
the   officer   attended. 

member  attributes: 

Academic_ branch 

description:    Branch    such    as  Computer    Science, 

Electrical   Engineering. 
value   class:    ACADEMIC_BEANCHES 

Academic_degree 

description:    Degree    such    as   Bachelor   of 
Science,    Master   of    Science, 
Engineering,    Doctorate. 

value   class:    ACADEMIC_DEGREES 

AID 

description:  Military  ID  of  the  officer  who 

earned  tHat  degree. 
value  class:  MID 
mandatory 

Date 

description:  Date  at  which  the  degree  earned 
value  class:  DATE 
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Name  of  university 

value"clas£:    UNIVEBSITY_NAMES 

Location   of    university: 
value  class:    COONTEI 

identifiers: 

Academic^branch    +    Academic_degree    +   AID 

MILITARY_EDOCATICN/CODESES 

description:    Information   about   the    military 

school  graduated  or  military  course 
attended,  location  of  school,  grade 
and   date   of    graduation. 

member  attributes: 

Course/Military   school   code 

value   class:    U0nBSZ/5CH00L_C0DE 

Location 

value  class:  COONTSI 

MEID 

description:  Military_ID  of  the  officer  who 
attended  the  school  or  course. 
value  class:  MID 
mandatory 

Course/School  title 

value  classT  COUESS/SCHOOL_TITLES 

Description 

description:  Textual  explanation  of  the  course 
value  class:  C00ESE/SCHOOL_DESCRIPTI0 N 

Duration 

description:  Duration  of  the  military 

education  in  weeks. 
value  class:  INTEGERS 

Date 

description:  Graduation  date  of  an  officer 

from  the  course  or  school, 
value  class:  DATE 

Grade 

description:  Grade  earned  for  that  course 

or  military  education. 
value  class:  C0UESE_GEADES 

identifiers 

Course/Military  school_code  +  Location  +  MZID 
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HEEICA1_INF0 

description:  Medical  information  and  overall 
medical  status  for  an  officer. 

memter  attributes: 

Medical_repcit_number 

descripticr:  Medical  report  number  of  last 

checking  of  the  officer, 
value  class:  REPORT_wnMBER 

HID 

description:  Military_ID  of  the  officer  to 

whom  the  information  belongs  tc. 
value  class:  HID 
mandatory 

Date 

description:  Date  of  the  report, 
value  class:  DATE 

Height 

description:  Height  of  the  officer- 
value  class:  HEIGHT 

Weight 

description:  Weight  of  the  officer, 
value  class:  WEIGHT 

Blood_pressure 

value  class:  BLOOD_FEESSnRE 

Eye_conditioE 

description:  Describes  the  condition  of  both 

eyes  of  the  officer- 
value  class:  EYE_CONDITION 

Ear_condi tier 

description:  Describes  the  condition  of  both 

ears  of  the  officer. 
value  class:  EAR_CONDITION 

Internal 

descriptioD:  Describes  the  condition  of 

internal  organs  of  the  officer, 
value  class:  INTEENAL_CONDITION 

General_health_status 

description:  An  overall  evaluation  of 

conditions  of  all  body  parts. 
This  status  is  descritea  by  soice 
member  attribute  values  of  this 
entity  class. 

value  class:  HEALTH  STATUS 
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identifiers: 

Medical_report_nuiiiber   +   HID 

IOEZIGN_LANGOAGI 

description:    It  is   used    to    define    the  officers 
foreign   language   capability. 

memter  attributes: 

Foreign_language 

value  class:    LANGUAGES 

FID 

description:  Military  ID  of  the  officer  who 

has  the  language  capability, 
value  class:  MID 
mandatory 

Degree  of  capability 

value  class:  LANG0AGE_CAPABILITY 

identifiers: 

Foreign_language  +  FIC 


DNII 

description:  Eescription  of  a  unit.  Unit  code, 
unit  name, unit  categori, location, 
superior  unit, unit  status  and 
officers  assigned  to  the  unit. 

member  attributes: 

Unit  code 

value   class:    nNIT_CCDE 

mandatory 

not  changeable 

Name 

value  class:  UNIT_NAMES 

Dnit_cateqor;y 

description:  Organizational  level  of  unit 
such  as  corps,  brigade, 
division. 

value  class:  UNIT_CAT 

Location 

description:  Location  of  unit. 

value  class:  UNIT  LOCATION 
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Superior_unit 

aescription:    The    unit   which   has    command   and 

control   of   this    unit, 
value  class:    UNIT 

Dnit_f uncticD 

description:    Type  of   function   or   service   which 

the    unit    performs. 
value  class:    nNIT_FDNC 

Of ficer_as signed 

description:    Officers   who   are   assigned   to    this 
unit. 

value  class:    OFFICEE 
inverse  :    anit_assigned 

multivalued 

identifiers: 

Unit   code 


aSSIGNMENT_REQDEST 

description:    The  request   which   is    made  by    any    unit, 
atout  officers   who  have   certain 
specifications   fit  for   a   specific 
position   to  he   assigned. 

memxier  attributes: 

Unit_code 

descripticii:  The  unit  who  is  issued  the 

request  for  assignment, 
value  class:  UNIT_C0DE 
mandatory 
not  changeable 

Re guest_n umber 

descriptici!:  A  number  which  is  given  by  the 
unit  who  is  issued  the  request, 
value  class:  REQDEST_NO 
mandatory 
not  changeable 

Date 

value  class:  DATE 

Rank 

description:  Rank  of  the  officer  who  is 

requested  for  assignment, 
value  class:  RANK 

Primary  branch  requested 
value  class:~3RANCHES 
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Secondary    tranch   requested 
value  class:    BHANCHIS 

Acadeiiiic_major_re  guested 

description:    Academic   major   and    degree    for 

this   assignment 
value   class:    ACADEMIC_I3AJ0R 
multivalued 

Mill tary_course/educat ion   requested 

descripticc:    Military   education   and/or  course 

which   is   needed    for   this 

assignment, 
value    class:    MILITAEY_ED0CATION/C0UESES 
multivalued 

Medicaids tat us 

description:    Lowest   value   for    medical   status 

which   is   needed    for   this 

assignment, 
value  class:    HEAL1h_STATDS 

Number_of   person 

description:    Number   of   officer   requested    with 

this   assignment    request, 
value  class:    INTEGEBS 

class  attributes: 

Number_of   requests 

descrip'Eicn:    The    number   of   requests   that 
issued   in    the  current  year, 
derivation:    Number   of   members    in   this    class 
which   Date=   current    year. 

identifiers: 

Dnit_code   +    Bequest_number 
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MID 

interclass  conrection:  suiclass  of  STRINGS  where 
format  is  5  digit  numbers 

BANK 

interclass  connection:  subclass  of  STRINGS  where 
specified 

rATZ 

interclass  connection:  subclass  of  STRINGS  where 
format  is: 

month:  number  where  >1  and  <12 
11.11 

day:  number  where  integer  and  >1  and  <31 

year:  number  where  integer  and  > 1 900and<2000 
where  fif  (month  =4  or  =5  or  =9  or  =11)  then 

day  <  30)  and  (if  month  =2  then  day  <29) 
ordering  by  year,  month,  day 

TERSON  NAMES 

interclass  connection:  subclass  of  STRINGS 

COUNTRY 

interclass  connection:  subclass  of  STRINGS  where 
specified 

SEX 

interclass  connection:  subclass  of  STRINGS  where 
format  is  1  character:  m,  f 

KARITAI_STATaS 

interclass  connection:  subclass  of  STRINGS  where 
format  is  1  character:  S,  M,  D, 

ADDRESS 

interclass  connection:  subclass  of  STRINGS 

ERANCHES 

interclass  connection:  subclass  of  STRINGS  where 
specified 

ACADEMIC  BRANCHES 

interclass  connection:  subclass  of  STRINGS  where 
specified 

ACADEMIC  DEGREES 

interclass  connection:  subclass  of  STRINGS  where 
values  are:  BA,  BS,  MA,  MS,  ENG,  PhD 

l]NIVERSITY_NAMES 

interclass  connection:  subclass  of  STRINGS 

C0DRSE/SCHOOL_CODE 

interclass  connection:  subclass  of  STRINGS  where 
format  is  5  characters 

CODRSE/SCHCOL_TIILES 

interclass  connection:  subclass  of  STRINGS 
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COUESE/SCHOOL  DESCRIPTION 

interclass  connection:  subclass 

of 

STRINGS 

COUESE  GRADES 

interclass  connection:  subclass 
format  is  2  characters 

of 

STRINGS 

where 

REPORT  NUMBER 

interclass  connection:  subclass 
format  is  6  digit  number 

of 

STRINGS 

where 

HEIGHT 

interclass  connection:  subclass 
format  is  fcsitive  integer 

of 

STRINGS 

where 

HEIGHT 

interclass  connection: 
format  is   positive 

subclass 
integer 

of 

STRINGS 

where 

ElOCD  PRESSURE 

interclass  connection: 
specified 

subclass 

of 

STRINGS 

where 

EYE  CONDITION 

interclass  connection: 
specified 

subclass 

of 

STRINGS 

where 

EAR  CONDITION 

interclass  connection: 
specified 

subclass 

of 

STRINGS 

where 

INTERNAL  CONDITION . 

interclass  connection: 
specified 

subclass 

of 

STRINGS 

where 

HEALTH  STATUS 

interclass  connection:  subclass 
format  is  2  digit  number  • 

of 

STRINGS 

where 

LANGUAGES 

interclass  connection: 
specified 

subclass 

of 

STRINGS 

where 

LANGUAGE  CAPABILITY 

interclass  connection:  subclass 
format  is  1  digit  nuirber 

of 

STRINGS 

where 

UNIT  CODE 

interclass  connection: 
specified 

subclass 

of 

STRINGS 

where 

UlilT  NAMES 

in'^erclass  connection: 

subclass 

of 

STRINGS 

UNIT  CAT 

interclass  connection:  subclass 
format  is  3  characters:  COR, 

ETE 

of  STRINGS  where 
DIV,  BRI,  EEG, 

Figure  A. 9    Domains  of  Attributes  (cont'd). 
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DNII  LOCATION 

in'terclass  cocDection:  subclass  of  STRINGS  where 
specified 

UNIT    FUNC 

in'Eerclass   connection:    subclass   of    STRINGS   where 
format  is    6   characters 

EEgDEST_NO 

mterclass  connection:  subclass  of  STRINGS  where 
format  is  6  digit  number 


Figure  A- 10    Domains  of  Attributes  (cont'd). 
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iPPESEIX  B 
SAMPLE  HELA1I0NS  FOB  PEBSONNEL  DATABASE 

This  Appendix  shews  sample  relations  of  the  PerscLnel 
Database,  which  are  used  to  iaplement.  The  iinplementaticn  of 
this  database  is  based  on  these  sample  relations. 

1  .  Helation  OFFICIR: 


UFI>  SELECT  * 

2  FROM  officer; 


MID    RANK     ONAMf  SEX     =RI«.3R4N    SEC«-6RAN    3T-IESS 


27363 

caot 

Johnson 

n 

art  i  1  ) 

oi  1  ot 

12239 

na  j 

•iernan^ez 

•n 

i  nf  t  ry 

soef  c 

52a58 

11  t 

^OOBi  ns 

f 

ai rasf 

ado 

435^6 

21t 

Smi  t  h 

n 

■««3i  c 

Ol  1  ot 

1  0999 

Icol 

9ro«n 

■n 

)  n  f  t  rv 

ado 

35768 

1  1  t 

3  r seno  er  q 

■n 

si  3C  3r 

01  1  ot 

2''364 

caor 

Ja-nes 

■n 

■n  \   1  e-iq 

SOI*  'c 

167«5 

•naj 

'.*'  "5 fit  on 

•n 

f  i  -la-ic 

ado 

10792 

cot 

Stone 

•n 

OPinan 

art  i  1  1 

''     r«C0f"<3     ssljcted. 


2.  Eelation  UNIT 


UFI>  SELECT  * 
2   FROM  u>tll: 


UCODE  UNA'^E 


OlOIV  Ist  Inf  Oi w 
099RG  9th  Art  Bpq 
6aOE?  bJth  Or:!  Oeoot 
12H0S  12th  ^d  Hoso 
02A7I  2-id  Ay/i  a  Jni  t 
073GP  7th  Soe  Fes  So 
20A3T  20th  Aipdef  Bt 
03E3N  3pd  E-iq  Bn 
02DIV  2nd  Inf  Oiw 

9    recoPds  sel ec t ed. 


UCAT 

JLOC 

SU»«-U^I  T 

JFUNC 

di  V 

Ft  .Ri  1 ev,KS 

ORArfM 

coiKoa 

brg 

Ft  .Ri  1 -vf  K3 

Ot^IV 

CO-nSu 

deo 

Ft  .Ri 1 ey ,K3 

OlDIV 

co'flse 

SOSD 

^t  .Ri ley,KS 

OlDIV 

coTtse 

req 

F  t  .  Go  rdon , G A 

0  30  I  V 

ComOa 

Dt 

Ft .Bpaqq, NC 

82DIV 

coKoa 

bt 

Ft  .Hood, TX 

02DIV 

coTisu 

bn 

Ft  .Hoorl,  rx 

020IV 

co'^oa 

di  V 

Ft  .Hood, TX 

07ARM 

comoa 
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3.    Relation    A_EDDCATION 


uri>  SELECT  • 

2   FROM  AfEDUCAriON; 


ABRAN 

AOEG 

AID 

UN  IV 

GOATE 

math 

83 

27363 

JCLA,C4 

06-AUG-75 

law 

BA 

1223'? 

Purdue, IN 

30-JUL-70 

ee 

83 

32^58 

Seatt 1 e, VA 

21-SEP-80 

dent 

83 

43596 

Bepkel ev.CA 

25-MAY-60 

mngt 

BA 

10999 

Lovo 1  a. IL 

0  1 -JUL-68 

cs 

MS 

10999 

MPGS,CA 

■07-OEC-73 

ee 

83 

55768 

Ricer  TX 

06-MAY-80 

cons 

BS 

2936a 

MIT, MA 

09-OEC-7fl 

ee 

MS 

2936a 

Ohior OH 

lO-JUL-85 

mr»gt 

BA 

167a5 

Cornel  1 .NY 

30-3EP-75 

me 

as 

10792 

Rutgers, NJ 

51-AUG-65 

ee 

MS 

10792 

NPGS.CA 

Ol-MAR-70 

12  records  selected. 


Relation    M    EDUCATION: 


UFI>    SELECT    * 

2  FROM    M4-EDUCATI0N 

3  ORDER    BY    CCOO£A,CGRAD£; 


CCOOEA     ME15  CGRADE  CDATE 


AA102 

13596 

A- 

12-JUL-3a 

A0002 

32159 

A 

23-N0V-92 

AS003 

27363 

Bf 

13-APR-77 

CS302 

32153 

B  + 

26-0CT-91 

CS509 

10999 

A- 

3)-JAN-77 

HS706 

13596 

A 

22-JU.N-92 

IA076 

167f)5 

A- 

22-N0V-76 

IS005 

10999 

A 

1 i-nci-70 

IS005 

12239 

A- 

30-SE»-72 

OC092 

10792 

A- 

Ol-DEC-69 

SS002 

35768 

B  + 

26-FEB-82 

11  records  selected. 
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5.    Relation    Jf    COUFSZS: 


JFI>  SE.E-T  • 

2   FROM  M^CDU^SES; 


:3DE3  CLDC3 


CTITLE 


CCE5C 


CD-fi 


AS003  Ft  .Si  n , OK 

*»102  - 1  .='jcite'-.  AL 

AD002  Ft  .3)  *  ss, TX 

CS302  ''ontsrev.CA 

CS509  ft  ."^srr  1  son  ,  If 

1*076  -t.iapri  son  ,  I' 

IS005  ^t  .3en-«i -»q,SA 

lS70b  Ft  .-'Ouston,  TX 

3C0"?2  4Der:3e?n,'^0 

SS002  Pf .33r 3on,  GA 


artillery  Ariiv  'ie'3  irttllerv  Sc'^ool 

aviation  Ap^y  Aviation  Sc^oo' 

air-let  Ap«^y  Air  defense  Sc"<ool 

aao  A'jo  Officer  Course 

a-3  A30  Of'icer  Course 

ao-nin  Ins.^or  A  t«  i  n  i  s  t  r  a  t  i  0" 

infantry  Ar""v   Infantry  Sc^OOl 

'^ealt'i  Aca.  o*  "'ealtn  Sci. 

or'jc^e"  Ap>«v  Tra.ana  C  fi  e^  .  Sc  "1 00  1 

signa'  Arwy  Signal  3c^ool 


48 

52 
3o 
13 
13 
30 
46 
96 
20 
50 


1 0  recoras  se 1 ec  t  ea, 


Eelation  LANGUAGE: 


UFI>  SELECT  • 

2   F»OM  LANGUAGE; 


NLAfJGjAGE 

FID 

l:E3REE 

3er-an 

27363 

D 

f  renci 

12239 

a 

r jSSi  an 

122J'9 

7 

Icorean 

1  37-15 

2 

ger-nan 

357bd 

5 

t  jr  «  i  sf> 

32^58 

a 

6  recor-33  sf'ectea. 
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7.  Relation  MEDICAL: 


UFI>  SELECT  • 

2   FROM 

medical; 

REPNO 

mo 

ROATE 

EYECONO 

EARCOND 

HST4T  OTHERS 

9838J-7 

12239 

30-NOV-8I 

0 

0 

0 

13282-5 

32a58 

01-'^AR-82 

12 

11 

2 

2a582-6 

ai596 

30-4UG-82 

0 

11 

I 

3758a-t 

27363 

07-JUN-fla 

1  I 

0 

I 

a858«-5 

35768 

31-4UG-8a 

■10 

0 

I 

12885-a 

29364 

0<?-MAR-85 

0 

0 

0 

20985-6 

I67a5 

ia-»PR-85 

11 

11 

2 

2a580-0 

I  0999 

17-NOV-80 

22 

0 

2 

37580-8 

10792 

06-OEC-80 

15 

21 

6 

25681-2 

27363 

Oa-OCT-81 

0 

0 

0 

10  record 

s  selected. 

8.  Relation  ASSIGNMENT: 


UFI>  SELECT  • 

2   FROM  ASSIGNMENT; 

AMID 

A^UCOOE 

OROERNO 

4SGD4TE 

10792 

09BRG 

038165-1 

01-3EP-6ft 

10999 

OIOIV 

327067-8 

20-SEP-69 

t67«5 

OlDlV 

a56277-3 

29-0EC-77 

27363 

098RG 

321578-7 

12-SEP-78 

2936^ 

6a0EP 

595879-5 

08-M4R-79 

12239 

02DIV 

a9i373-6 

30-OCT-75 

32«58 

20A3T 

482683-2 

1 0-J4N-85 

a3596 

12H0S 

321782-a 

1 1-4JG-82 

35768 

OlOIV 

152282-9 

01 -4PR-82 

10792 

02DIV 

320871-5 

1 1-J4N-71 

10999 

02DIV 

118273-5 

50-A;)G-73 

10792 

6ilD£P 

320678-1 

15-APR-78 

12  records  selected. 
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9.    Relation   ASGEE^: 


UF1>    SELECT    • 

2       FROM    ASGSEQ; 


^••UCOnE    KEQNU««         REQOATE         R«-RA^JK    R«-PHISR    R«-SEC3R    R«-*CABR    RvMRED    RfHSTAT 
NU><OFPERS 


12H0S         031081-5    30-0EC-64    naj  financ       ado 

2 


mngt  IA076 


20ABT    922185-7  06-MAR-85  caot    art i H   pilot    ee       AS003 
I 

20SGP    327685-4  27-APR-85  i>ai     inftry   soefc    la-      IS005 
1 
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